mayeut / manylinux-timeline

Tracking manylinux progress on packager side
https://mayeut.github.io/manylinux-timeline/
BSD 2-Clause "Simplified" License
22 stars 2 forks source link

Ability to compute stats for a particular package? #529

Closed alex closed 6 months ago

alex commented 10 months ago

Hi!

This is a wonderful resource. I'm very interested in having stats for one of my packages (cryptography), so we can know when we can specifically drop older manylinux.

I don't expect the website to provide this for us (though I wouldn't say no if it did!), but I'm wondering if the scripts can be repurposed to do this?

mayeut commented 10 months ago

I've been meaning to add something like this to pypinfo but never found the time (or real motivation) to do this.

If going with the scripts here, then, you only need the consumer part so, it might be "as easy" as:

Otherwise, you can extract the raw data using pypinfo. You'll need the pip version (to know if it's pep600 compliant) & glibc version. Unfortunately, the installer-version field creates too much fragmentation to really see anything useful from the raw data.

pypinfo --limit 1000 --markdown --percent --where 'REGEXP_CONTAINS(file.filename, r"-manylinux([0-9a-zA-Z_]+)\.whl") AND details.distro.libc.lib = "glibc"' 'cryptography==42.*' libc-version installer-version
Served from cache: False
Data processed: 27.12 GiB
Data billed: 27.12 GiB
Estimated cost: $0.14

| libc_version | installer_version | percent | download_count |
| ------------ | ----------------- | ------: | -------------: |
| 2.35         | 23.3.2            |  28.33% |     11,875,858 |
| 2.31         | 21.0.1            |  19.58% |      8,207,497 |
| 2.31         | 23.3.2            |  16.74% |      7,019,568 |
| 2.36         | 23.2.1            |   4.42% |      1,854,368 |
| 2.28         | 23.3.2            |   4.19% |      1,757,003 |
| 2.36         | 23.3.2            |   4.18% |      1,750,943 |
| 2.36         | 23.0.1            |   2.29% |        958,621 |
| 2.31         | 23.3.1            |   2.24% |        939,503 |
| 2.35         | 23.3.1            |   1.85% |        775,567 |
| 2.31         | 23.2.1            |   1.62% |        677,250 |
| 2.28         | 23.0.1            |   1.14% |        478,933 |
| 2.31         | 21.2.4            |   1.05% |        441,715 |
| 2.31         | 23.0.1            |   0.86% |        362,243 |
| 2.35         | 23.2.1            |   0.78% |        327,675 |
| 2.35         | 22.0.2            |   0.76% |        319,289 |
| 2.31         | 23.1.2            |   0.74% |        312,184 |
| 2.36         | 23.3.1            |   0.54% |        227,185 |
| 2.31         | 22.3.1            |   0.46% |        191,173 |
| 2.31         | 22.0.4            |   0.40% |        169,766 |
| 2.34         | 23.3.2            |   0.40% |        167,205 |
| 2.31         | 24.0              |   0.35% |        144,861 |
| 2.28         | 21.3.1            |   0.31% |        131,062 |
| 2.28         | 22.0.4            |   0.31% |        130,622 |
| 2.28         | 23.3.1            |   0.29% |        121,508 |
| 2.37         | 23.3.1            |   0.28% |        119,154 |
....

But you can get an idea just using libc:

Matt@MacBook-Pro setup-pipx % pypinfo --limit 1000 --markdown --percent --where 'REGEXP_CONTAINS(file.filename, r"-manylinux([0-9a-zA-Z_]+)\.whl") AND details.distro.libc.lib = "glibc"' 'cryptography==42.*' libc-version 
Served from cache: False
Data processed: 25.34 GiB
Data billed: 25.35 GiB
Estimated cost: $0.13

| libc_version | percent | download_count |
| ------------ | ------: | -------------: |
| 2.31         |  45.87% |     19,229,284 |
| 2.35         |  33.34% |     13,975,003 |
| 2.36         |  11.72% |      4,913,470 |
| 2.28         |   7.44% |      3,117,450 |
| 2.34         |   0.88% |        367,114 |
| 2.37         |   0.48% |        201,499 |
| 2.38         |   0.25% |        104,778 |
| 2.32         |   0.01% |          3,517 |
| 2.29         |   0.01% |          3,062 |
| 2.33         |   0.00% |          2,057 |
| 2.27         |   0.00% |          1,644 |
| 2.38.9000    |   0.00% |          1,397 |
| 2.30         |   0.00% |            506 |
| 2.17         |   0.00% |            214 |
| 2.26         |   0.00% |            185 |
| 2.39         |   0.00% |             29 |
| 2.37.9000    |   0.00% |              1 |
| Total        |         |     41,921,210 |
alex commented 10 months ago

Yes, I think pypiinto may be more than enough. Really the only missing feature is a "sum of percent so far" value.

On Sun, Feb 4, 2024, 3:30 PM Matthieu Darbois @.***> wrote:

I've been meaning to add something like this to pypinfo https://github.com/ofek/pypinfo but never found the time (or real motivation) to do this.

If going with the scripts here, then, you only need the consumer part so, it might be "as easy" as:

  • clearing the consumer_data folder
  • adding AND file.project = "cryptography" to the WHERE clause of the bigquery query in update_consumer_data.py.
  • running nox -s run -- -v --skip-cache --bigquery-credentials ./bigquery_credentials.json

Otherwise, you can extract the raw data using pypinfo. You'll need the pip version (to know if it's pep600 compliant) & glibc version. Unfortunately, the installer-version field creates too much fragmentation to really see anything useful from the raw data.

pypinfo --limit 1000 --markdown --percent --where 'REGEXPCONTAINS(file.filename, r"-manylinux([0-9a-zA-Z]+).whl") AND details.distro.libc.lib = "glibc"' 'cryptography==42.*' libc-version installer-version Served from cache: False Data processed: 27.12 GiB Data billed: 27.12 GiB Estimated cost: $0.14

libc_version installer_version percent download_count
2.35 23.3.2 28.33% 11,875,858
2.31 21.0.1 19.58% 8,207,497
2.31 23.3.2 16.74% 7,019,568
2.36 23.2.1 4.42% 1,854,368
2.28 23.3.2 4.19% 1,757,003
2.36 23.3.2 4.18% 1,750,943
2.36 23.0.1 2.29% 958,621
2.31 23.3.1 2.24% 939,503
2.35 23.3.1 1.85% 775,567
2.31 23.2.1 1.62% 677,250
2.28 23.0.1 1.14% 478,933
2.31 21.2.4 1.05% 441,715
2.31 23.0.1 0.86% 362,243
2.35 23.2.1 0.78% 327,675
2.35 22.0.2 0.76% 319,289
2.31 23.1.2 0.74% 312,184
2.36 23.3.1 0.54% 227,185
2.31 22.3.1 0.46% 191,173
2.31 22.0.4 0.40% 169,766
2.34 23.3.2 0.40% 167,205
2.31 24.0 0.35% 144,861
2.28 21.3.1 0.31% 131,062
2.28 22.0.4 0.31% 130,622
2.28 23.3.1 0.29% 121,508
2.37 23.3.1 0.28% 119,154

....

But you can get an idea just using libc:

@.** setup-pipx % pypinfo --limit 1000 --markdown --percent --where 'REGEXPCONTAINS(file.filename, r"-manylinux([0-9a-zA-Z]+).whl") AND details.distro.libc.lib = "glibc"' 'cryptography==42.' libc-version Served from cache: False Data processed: 25.34 GiB Data billed: 25.35 GiB Estimated cost: $0.13

libc_version percent download_count
2.31 45.87% 19,229,284
2.35 33.34% 13,975,003
2.36 11.72% 4,913,470
2.28 7.44% 3,117,450
2.34 0.88% 367,114
2.37 0.48% 201,499
2.38 0.25% 104,778
2.32 0.01% 3,517
2.29 0.01% 3,062
2.33 0.00% 2,057
2.27 0.00% 1,644
2.38.9000 0.00% 1,397
2.30 0.00% 506
2.17 0.00% 214
2.26 0.00% 185
2.39 0.00% 29
2.37.9000 0.00% 1
Total 41,921,210

— Reply to this email directly, view it on GitHub https://github.com/mayeut/manylinux-timeline/issues/529#issuecomment-1925905470, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAAAGBDQ2R6LA65PMRZOUGLYR7VXBAVCNFSM6AAAAABCZCWZG2VHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTSMRVHEYDKNBXGA . You are receiving this because you authored the thread.Message ID: @.***>