meshy / pythonwheels

Adoption analysis of Python Wheels: https://pythonwheels.com/
BSD 2-Clause "Simplified" License
102 stars 26 forks source link

Use fresher download stats from PyPI #100

Closed hugovk closed 6 years ago

hugovk commented 6 years ago

Fixes #94.

I've made this, which is a weekly dump of the 5,000 most-downloaded packages from PyPI, using pypinfo.

As top_packages is being removed from the PyPI API, here's a PR to instead use data from top-pypi-packages.

You'll need to stick a wget or curl to download the latest version of either the data before running generate.py; see .travis.yml for an example.

There's a choice of data over the preceding 30 days or 365 days. Perhaps a shorter timespan may give more "relevant" results. Here's a comparison.

Pre-PR

365 days

30 days

meshy commented 6 years ago

Thank you very much for this, and for the website this is drawing the information from -- it's a great tool.

If we're making this change (and I think we should), I think we ought to have some explanation of why the data might not be what people expect any more. In particular, the website should mention that we used to get the stats for all downloads ever, and how this is now going to be more volatile.

In terms of the volatility of the data, I think I'd prefer to keep that low, and go with the 365-day source.

I'm open to suggestions otherwise though, if anyone feels strongly that the data should be more fluid, month-to-month?

hugovk commented 6 years ago

You're welcome!

If we're making this change (and I think we should), I think we ought to have some explanation of why the data might not be what people expect any more. In particular, the website should mention that we used to get the stats for all downloads ever, and how this is now going to be more volatile.

Sounds sensible. Would you like to word it, or shall I propose something?

In terms of the volatility of the data, I think I'd prefer to keep that low, and go with the 365-day source.

I'm open to suggestions otherwise though, if anyone feels strongly that the data should be more fluid, month-to-month?

Sure, 365 days is good. An argument for 30 days is "stale" packages will drop off the list sooner, but if you want more stable, 365 it is. And everything in the top 360 are big hitters anyway.

meshy commented 6 years ago

Sounds sensible. Would you like to word it, or shall I propose something?

I'm afraid I don't have the time to word this at the moment, so I'd be very grateful if you were to do it :)

hugovk commented 6 years ago

No problem! How's that last commit look?

meshy commented 6 years ago

@hugovk That's fantastic, thank you very much once again.

I hope you don't mind, I'm not going to merge this quite yet, as I hope to find the time in the next little while to try this out.

I'm expecting to be especially busy in the next few weeks, so I'm sorry to say it's not right at the top of the list, but hopefully I'll be able to merge and deploy this soon!

hugovk commented 6 years ago

You're welcome, and thanks also for python-wheels! I used it as the basis for https://hugovk.github.io/drop-python/

hugovk commented 6 years ago

Here's a static build showing wheels are up from 263 / 360 in April, to 291 / 360!

https://hugovk.github.io/pythonwheels/

meshy commented 6 years ago

Thank you for this, and your patience. I'll try to update the server now.

hugovk commented 6 years ago

Looks good!