Open waldoj opened 9 years ago
Beyond making CKAN more cloud-ready, you may want to take this opportunity to address some items on the wishlist as well
https://github.com/ckan/ideas-and-roadmap
As a lot of these ideas already come from the CKAN global community and these signals highlight the pain-points and gaps from various CKAN implementations.
The multisite project will go a long way towards democratizing Open Data for sure, and having mechanisms to make sure that our Open Data future works like the web (data portals linked to one another), you may also want to think about the data layer, and what ODI can do in this project, to encourage federation.
Some other items that come to mind:
Would be great to see a blog post about what you learned from this process once you're a little further on your way.
You bet!
You may want to also consider some form of facilitated federation. That is, as each new CKAN multisite instance is spun up, the publisher can optionally be prompted to be included in a central registry.
This registry can show the publisher's information. The publisher can even ask that his catalog metadata is available for harvesting.
Future iterations of the registry can even support federated search across catalogs.
Yes. Dataset registries are really important. This is quite likely a thing that is missing, for the intended purpose of this project. After all, the host of each CKAN Multisite server surely wants to keep up with all datasets hosted within the sub-sites. (I know that's not quite what you're describing, but it's the same mechanism.) I intend to bundle the ckanext-datajson extension with this, so of course the host could just poll each site's /data.json
file, but a mechanism to (optionally!) ping one or more URLs when a dataset is added or updated seems potentially very useful.
If this project achieves its main goal, its foreseeable that there will thousands, if not millions of CKAN data repositories, hopefully, organically linked and federated with each repo ideally closest to and populated by the data producer.
So registries are really a necessary part of the project as discoverability issues will naturally follow this explosion of data repos.
From the data consumer side, ODI may want to think about how to create facilitated discovery.
Beyond making sure that ckanext-datajson is bundled in, an effort should be made to automate the creation of expressive catalog/dataset metadata the default, rather than just the barebones, manually-entered metadata.
ODI may even want to go further than Project Open Data guidance and include additional metadata that can be automatically computed and used for discovery - like bounding boxes for geospatial data, and the date range of a dataset.
In our implementation, we try to do this by contextualizing datasets through time and place, along with the usual tags and good metadata publishing practices as espoused by Project Open Data.
One other thing that comes to mind, but not necessarily for this 1st round is the ability for administrators to scan datasets within the repository for PII or other sensitive info (ideally prior to going public). We can encourage good practices re. records management, but an additional layer of protection would be welcome.
@jqnatividad some of this is related to https://github.com/ckan/ideas-and-roadmap/issues/48 and your ticket https://github.com/ckan/ideas-and-roadmap/issues/59
Yes. Looking forward to https://github.com/boxkite/ckan-multisite.
Hopefully, it will lay the groundwork for these ideas along with making admin easier in general https://github.com/opendata/CKAN-Multisite/issues/8 and multsite admin can be extended to manage other CKAN ini settings as well.
Did we get everything? Does anything else need to be done in order to accomplish our goal?