treflehq / trefle-api

🍀 Trefle is a botanical JSON REST API for plants species, allowing you to search and query over all the registered species, and build the next gardening apps and farming robots.
https://trefle.io
GNU Affero General Public License v3.0
476 stars 50 forks source link

Downloadable archive of Trefle database #81

Open SebastianKG opened 3 years ago

SebastianKG commented 3 years ago

Is your feature request related to a problem? Please describe. This may be a stretch as a "feature", but I'm still looking for a way to get at the underlying dataset as a whole, without being rate-limited. I have a long-running crawler for the API and I have slowly collected a lot of it, but it remains incomplete and probably always will (when crawling lawfully with only one API token, the limit is quite strict). Back in August, we discussed a data dump (here: https://github.com/treflehq/trefle-api/issues/44), and the following was said:

We will soon provide an archive of our database for you to download, and thus avoid iterating on all the plants.

Describe the solution you'd like I'm sure the project is strapped for developer time and this may not be a priority, but I would love to build and publicize some cool Apache-Spark-aggregated high-level uses for this data. To enable projects like this, a data dump (or a much more lenient page size limit, which would be more expensive for the project, I expect) seems necessary.

itsezc commented 3 years ago

Not the most up to date, but this maybe of help: https://github.com/treflehq/dump