weecology / retriever

Quickly download, clean up, and install public datasets into a database management system
http://data-retriever.org
Other
305 stars 133 forks source link

More data sources #1575

Open henrykironde opened 3 years ago

henrykironde commented 3 years ago
pri1311 commented 2 years ago

Hey @henrykironde! Would love to start contributing, and I believe adding datasets might be a good place to start. Could I pick one up from the lot or would you be assigning any particular one?

henrykironde commented 2 years ago

Hi @pri1311, Feel free to pick any data source. Let me know in case you need any clarification.

pri1311 commented 2 years ago

Let me know in case you need any clarification.

I have added a simple dataset as of now to get a basic idea of the repository. If the PR is merged/approved, will move on to more datasets. I am particularly interested in a separate open issue - Adding support for sequence data.

pri1311 commented 2 years ago

Also, I had one small doubt. I was going through some of the json files in the retriever-recipes repository. A lot of the Kaggle datasets were included. But since Kaggle allows downloading test and train data all at once as a zip file, how will those be added to this package? (Since I saw Kaggle mentioned as one of the data sources here.)

henrykironde commented 2 years ago

@pri1311 for sequence data, I have not found suitable sources yet, but you can go fo it.

since Kaggle allows downloading test and train data all at once as a zip file,

That is a good case since we download all the data using one url. We then extract all the files or we can extract a particular file. Checkout the Json files with extract for some examples. https://github.com/weecology/retriever-recipes/search?q=extract.

Let me know incase you have more issues or need clarification.