hugovk / top-pypi-packages

A regular dump of the most-downloaded packages from PyPI
https://hugovk.github.io/top-pypi-packages
223 stars 13 forks source link

[Question] How the number of downloads is computed? #34

Closed ternaus closed 4 months ago

ternaus commented 4 months ago

If I use link like https://pypistats.org/packages/albumentations numbers look around 10% higher.

I am all for filtering noise from the data, just curious what exactly is filtered.

hugovk commented 4 months ago

This repo uses https://github.com/ofek/pypinfo to query BigQuery over the previous 30 days:

https://github.com/hugovk/top-pypi-packages/blob/344411331e24d61307be65cede4facffd84cad5c/generate.sh#L19

pypinfo defaults to only downloads from the pip installer:

https://github.com/ofek/pypinfo/issues/46#issuecomment-388631707

https://pypistats.org/about queries BigQuery directly, so I think includes all installers.

ternaus commented 4 months ago

pypinfo defaults to only downloads from the pip installer:

Thank you. Just to verify, pypinfo just does not count downloads from mirrors?

hugovk commented 4 months ago

It's all downloads logged by PyPI. By default, that does not include downloads from mirrors, other clients or ancient pip.

We can check with installer:

❯ pypinfo albumentations installer
Served from cache: False
Data processed: 1.05 GiB
Data billed: 1.05 GiB
Estimated cost: $0.01

| installer_name | download_count |
| -------------- | -------------- |
| pip            |      1,902,285 |

Checking with all installers:

❯ pypinfo --all albumentations installer
Served from cache: False
Data processed: 1.05 GiB
Data billed: 1.05 GiB
Estimated cost: $0.01

| installer_name | download_count |
| -------------- | -------------- |
| pip            |      1,902,266 |
| uv             |        149,990 |
| poetry         |         31,174 |
| None           |         25,125 |
| requests       |          8,042 |
| bandersnatch   |          2,042 |
| Nexus          |          1,903 |
| pdm            |          1,428 |
| Browser        |          1,023 |
| Bazel          |            449 |
| Total          |      2,123,442 |
ternaus commented 4 months ago

Got it, thanks. This is very helpful. Will update the description at https://pypilb.vercel.app/