natcap / data.naturalcapitalproject.stanford.edu

Scripts and images to run CKAN using Docker Compose
0 stars 0 forks source link

Upload data to GCS when adding to CKAN #28

Open phargogh opened 7 months ago

phargogh commented 7 months ago

Use case: public data that should be publicly accessible from the catalog.

Assuming:

Then, when running the upload script to create the package on ckan:

At this time, we still need to determine:

phargogh commented 7 months ago

Look into https://github.com/open-data/ckanext-cloudstorage for a possible alternative that integrates with the existing FileStore API. Maybe this would handle most of the hard work?

phargogh commented 7 months ago

ckanext-cloudstorage is not currently viable

I looked into ckanext-cloudstorage and unfortunately, the package is not a viable solution for us at this time because:

  1. There are 2 closely related repos that might be able to be used: https://github.com/open-data/ckanext-cloudstorage and https://github.com/TkTech/ckanext-cloudstorage, with slightly different maintenance states.
  2. The package only supports python2 (though a python3 PR is in progress: https://github.com/TkTech/ckanext-cloudstorage/pull/42
  3. The package does not yet support GCS (PR in progress: https://github.com/TkTech/ckanext-cloudstorage/pull/52)

So, our best solution is to instead manage file uploads to GCS on our own and then post the URLs to CKAN as needed. Access controls to the data would then be handled by the storage backend (GCS, GDrive, etc.) rather than by CKAN, but this is probably safer anyways per Stanford's risk classifications.