weecology / retriever

Quickly download, clean up, and install public datasets into a database management system
http://data-retriever.org
Other
307 stars 134 forks source link

Add datasets from data hub awesome collection #1404

Open henrykironde opened 4 years ago

henrykironde commented 4 years ago

Collections

mayurdeep commented 4 years ago

Almost all the datasets at Datahub collection already uses its own datapackage.json to define the schema of the data. https://datahub.io/docs/core-data https://frictionlessdata.io/data-packages/

So can we not build a module that can parses datapackage.json and restructure it to our format.

I can work on that if this seems a good idea..

henrykironde commented 4 years ago

@pathak-mayurdeep, thats great. We should add all the datapackages here https://github.com/weecology/retriever/blob/master/scripts/datapackages.yml and then create the module

ethanwhite commented 4 years ago

@pathak-mayurdeep Here's our existing (unmerged) work from a couple of years ago on using existing Frictionless Data data packages: https://github.com/weecology/retriever/pull/980 Feel free to build off this if helpful.

ethanwhite commented 4 years ago

Or maybe this got included in #1010? @henrykironde - #980 suggests that everything went into #1010, but I don't remember us getting all the way to loading external packages. Do you remember the status of that work as of #1010?

henrykironde commented 4 years ago

We had reached a point where we could ingest some of the data but then specifications changed. Currently we need to create a dictionary for the data types.

mayurdeep commented 4 years ago

Thanks for all the info.. I'll start working on this.

mayurdeep commented 4 years ago

Sorry about the late PR, I was offline for several days due to some personal health issues..

I have used datapackage-py in this. Please let me know whatever changes is needed.