greasyfork-org / greasyfork

An online repository of user scripts.
https://greasyfork.org
GNU General Public License v3.0
1.46k stars 435 forks source link

The data dump is missing #1196

Closed nropgrammer closed 11 months ago

nropgrammer commented 11 months ago

There is no db.sql.gz file to download from the data directory referenced here.

JasonBarnabe commented 11 months ago

This is no longer available. I've updated the document.

nropgrammer commented 11 months ago

May I ask why? Having a dump of all scripts would be much more preferable to scraping the entire site and using up server resources

JasonBarnabe commented 11 months ago

The intent was that it was an easy way to get a normal-ish site if you were going to send in PRs to this repo, but no one ever used it, so I removed it.

API endpoints exist if you want to pull data from the site.

nropgrammer commented 11 months ago

I'm aware of the The API (and of I'll be using it) but the API is missing:

Additionally the script list API's pagination is broken? Currently https://greasyfork.org/scripts.json?meta=1 counts 36769 scripts, and with https://greasyfork.org/scripts.json having 50 results per page, there should be 736 pages of results. The pagination only goes to page 20 though, and 20 * 50 != 36769. Maybe I am misunderstanding script count?

JasonBarnabe commented 11 months ago

If you have specific requests for the API, please create issues for each one.

I'm not sure what you're doing with the data, but keep in mind that you do not necessarily have permission from the creators (barring any explicit permission given by e.g. a license).

nropgrammer commented 11 months ago

The data is needed to find all @require libraries used. Is the 20 page limit on purpose?

JasonBarnabe commented 11 months ago

Script loading is done through the search code which defaults to 1000 results.