Closed hugovk closed 6 years ago
@hugovk that's valuable information, thank you very much for sharing this.
I'm not keen on hard-coding the top packages, as long as the current system works.
While I don't relish the idea of making this change, BigQuery does allow us to do some interesting things.
For example, looking at this query to fetch the most downloaded packages it looks as though we may be able to set a shorter date range (say, a year), to encourage legacy projects to fall faster. That's probably worth tackling at another point though. We can stick with totals for the moment.
A shorter time range sounds good.
Attached is the output of running pypinfo --json --days 365 --limit 360 "" project > 365.json
from the useful https://github.com/ofek/pypinfo.
Here's how the top 10 looks:
⌂63% [hugo:~/github/pypinfo] thousands-seperator* ± pypinfo -th --days 365 --limit 10 "" project
project download_count
--------------- --------------
simplejson 327,946,463
six 214,930,152
python-dateutil 152,089,489
setuptools 149,294,971
botocore 146,935,887
pip 140,216,305
requests 137,229,399
pyasn1 134,867,638
docutils 126,916,467
jmespath 117,212,884
pypinfo
looks fantastic -- good find, thanks.
Download counts are being removed from PyPI and instead BigQuery needs to be used.
See https://github.com/pypa/warehouse/pull/2480 which removes the
top_packages
from the API.Right now, PyPI is running from https://github.com/pypa/pypi-legacy but will be switching to https://github.com/pypa/warehouse soon. (Their milestones show they're 95% complete to launch, and 38% complete to shut down legacy PyPI.)
See https://github.com/badges/shields/issues/716 and https://github.com/zhmcclient/python-zhmcclient/pull/73 for some more info on BigQuery.
Perhaps in the short term Python Wheels could use a hardcoded list of the top 360 packages.