timwis / jkan

A lightweight, backend-free open data portal, powered by Jekyll
https://jkan.io
MIT License
219 stars 311 forks source link

Document options for harvesting/importing/syncing remote metadata/data #77

Open JJediny opened 8 years ago

JJediny commented 8 years ago

Part of the appeal of today's distributed content generation is the ability not to have to separately maintain metadata/datasets - when they are better maintained elsewhere by others. It's safe to assume that many use-cases that JKAN calls for will need/want to take a hybrid approach to catalog both a collection of datasets maintained on JKAN together with those from remote sources.

Harvesting/snap-shoting datasets can be a version control nightmare, but its arguably better then recreating them entirely manually... However documentation could/should cover a few of the best options/processes out there to achieve the closest thing to syncing across multiple remote services. As an alternative/complementary approach it would also be good to include methods to integrate push notifications or webhooks to for example run a build and a gulp process to refresh a remote source and have a repeatable process to manage the fetch/ingest/transform/import process... this could then rebuilt a new docker container with jkan or run locally and commit the bulk updates

JJediny commented 8 years ago

This also calls into the need to have a canonical source field to identify if the record on JKAN is externally or internally maintained

timwis commented 8 years ago

Interesting idea @JJediny. To be honest, I don't have much experience doing that (I couldn't get ckan harvester to work). I agree we should document it though. Would you be open to working on a page in the wiki about it?

(Also, regarding #62, just waiting to hear from you on the dataset slug question)