pypi / warehouse

The Python Package Index
https://pypi.org
Apache License 2.0
3.54k stars 953 forks source link

Release feeds (RSS) sometimes report new versions out of order #12925

Closed mgorny closed 1 year ago

mgorny commented 1 year ago

Describe the bug I've noticed that for some projects, a new release appears at the very end of release feed rather than at the beginning.

For example, https://pypi.org/rss/project/atomicwrites/releases.xml lists release 1.4.0, 1.3.0, …, then 1.4.1 at the very end. It seems that my feed reader (Liferea) is limited to reporting the N newest entries from every feed, so it doesn't report the 1.4.1 version.

Expected behavior I expected the newest releases to be reported at the beginning of the feed (i.e. as the newest "news").

To Reproduce https://pypi.org/rss/project/atomicwrites/releases.xml

My Platform

Verbose curl output ``` $ curl -vvvvvv https://pypi.org/rss/project/atomicwrites/releases.xml >/dev/null % Total % Received % Xferd Average Speed Time Time Time Current Dload Upload Total Spent Left Speed 0 0 0 0 0 0 0 0 --:--:-- --:--:-- --:--:-- 0* Trying 151.101.0.223:443... * Connected to pypi.org (151.101.0.223) port 443 (#0) * ALPN: offers h2 * ALPN: offers http/1.1 * CAfile: /etc/ssl/certs/ca-certificates.crt * CApath: /etc/ssl/certs * [CONN-0-0][CF-SSL] TLSv1.0 (OUT), TLS header, Certificate Status (22): } [5 bytes data] * [CONN-0-0][CF-SSL] TLSv1.3 (OUT), TLS handshake, Client hello (1): } [512 bytes data] * [CONN-0-0][CF-SSL] TLSv1.2 (IN), TLS header, Certificate Status (22): { [5 bytes data] * [CONN-0-0][CF-SSL] TLSv1.3 (IN), TLS handshake, Server hello (2): { [122 bytes data] * [CONN-0-0][CF-SSL] TLSv1.2 (IN), TLS header, Finished (20): { [5 bytes data] * [CONN-0-0][CF-SSL] TLSv1.2 (IN), TLS header, Supplemental data (23): { [5 bytes data] * [CONN-0-0][CF-SSL] TLSv1.3 (IN), TLS handshake, Encrypted Extensions (8): { [19 bytes data] * [CONN-0-0][CF-SSL] TLSv1.2 (IN), TLS header, Supplemental data (23): { [5 bytes data] * [CONN-0-0][CF-SSL] TLSv1.3 (IN), TLS handshake, Certificate (11): { [2856 bytes data] * [CONN-0-0][CF-SSL] TLSv1.2 (IN), TLS header, Supplemental data (23): { [5 bytes data] * [CONN-0-0][CF-SSL] TLSv1.3 (IN), TLS handshake, CERT verify (15): { [264 bytes data] * [CONN-0-0][CF-SSL] TLSv1.2 (IN), TLS header, Supplemental data (23): { [5 bytes data] * [CONN-0-0][CF-SSL] TLSv1.3 (IN), TLS handshake, Finished (20): { [52 bytes data] * [CONN-0-0][CF-SSL] TLSv1.2 (OUT), TLS header, Finished (20): } [5 bytes data] * [CONN-0-0][CF-SSL] TLSv1.3 (OUT), TLS change cipher, Change cipher spec (1): } [1 bytes data] * [CONN-0-0][CF-SSL] TLSv1.2 (OUT), TLS header, Supplemental data (23): } [5 bytes data] * [CONN-0-0][CF-SSL] TLSv1.3 (OUT), TLS handshake, Finished (20): } [52 bytes data] * SSL connection using TLSv1.3 / TLS_AES_256_GCM_SHA384 * ALPN: server accepted h2 * Server certificate: * subject: CN=pypi.org * start date: Jul 26 19:45:14 2022 GMT * expire date: Aug 27 19:45:13 2023 GMT * subjectAltName: host "pypi.org" matched cert's "pypi.org" * issuer: C=BE; O=GlobalSign nv-sa; CN=GlobalSign Atlas R3 DV TLS CA 2022 Q3 * SSL certificate verify ok. * Using HTTP2, server supports multiplexing * Copying HTTP/2 data in stream buffer to connection buffer after upgrade: len=0 * [CONN-0-0][CF-SSL] TLSv1.2 (OUT), TLS header, Supplemental data (23): } [5 bytes data] * [CONN-0-0][CF-SSL] TLSv1.2 (OUT), TLS header, Supplemental data (23): } [5 bytes data] * [CONN-0-0][CF-SSL] TLSv1.2 (OUT), TLS header, Supplemental data (23): } [5 bytes data] * h2h3 [:method: GET] * h2h3 [:path: /rss/project/atomicwrites/releases.xml] * h2h3 [:scheme: https] * h2h3 [:authority: pypi.org] * h2h3 [user-agent: curl/7.87.0] * h2h3 [accept: */*] * Using Stream ID: 1 (easy handle 0x55b7477fa2d0) * [CONN-0-0][CF-SSL] TLSv1.2 (OUT), TLS header, Supplemental data (23): } [5 bytes data] > GET /rss/project/atomicwrites/releases.xml HTTP/2 > Host: pypi.org > user-agent: curl/7.87.0 > accept: */* > * [CONN-0-0][CF-SSL] TLSv1.2 (IN), TLS header, Supplemental data (23): { [5 bytes data] * [CONN-0-0][CF-SSL] TLSv1.3 (IN), TLS handshake, Newsession Ticket (4): { [177 bytes data] * [CONN-0-0][CF-SSL] TLSv1.2 (IN), TLS header, Supplemental data (23): { [5 bytes data] * [CONN-0-0][CF-SSL] TLSv1.2 (OUT), TLS header, Supplemental data (23): } [5 bytes data] * [CONN-0-0][CF-SSL] TLSv1.2 (IN), TLS header, Supplemental data (23): { [5 bytes data] * [CONN-0-0][CF-SSL] TLSv1.2 (IN), TLS header, Supplemental data (23): { [5 bytes data] < HTTP/2 200 < content-security-policy: base-uri 'self'; block-all-mixed-content; connect-src 'self' https://api.github.com/repos/ https://*.google-analytics.com https://*.analytics.google.com https://*.googletagmanager.com fastly-insights.com *.fastly-insights.com *.ethicalads.io https://api.pwnedpasswords.com https://cdn.jsdelivr.net/npm/mathjax@3.2.2/es5/sre/mathmaps/ https://2p66nmmycsj3.statuspage.io; default-src 'none'; font-src 'self' fonts.gstatic.com; form-action 'self' https://checkout.stripe.com; frame-ancestors 'none'; frame-src 'none'; img-src 'self' https://warehouse-camo.ingress.cmh1.psfhosted.org/ https://*.google-analytics.com https://*.googletagmanager.com *.fastly-insights.com *.ethicalads.io; script-src 'self' https://*.googletagmanager.com https://www.google-analytics.com https://ssl.google-analytics.com *.fastly-insights.com *.ethicalads.io 'sha256-U3hKDidudIaxBDEzwGJApJgPEf2mWk6cfMWghrAa6i0=' https://cdn.jsdelivr.net/npm/mathjax@3.2.2/ 'sha256-1CldwzdEg2k1wTmf7s5RWVd7NMXI/7nxxjJM2C4DqII=' 'sha256-0POaN8stWYQxhzjKS+/eOfbbJ/u4YHO5ZagJvLpMypo='; style-src 'self' fonts.googleapis.com *.ethicalads.io 'sha256-2YHqZokjiizkHi1Zt+6ar0XJ0OeEy/egBnlm+MDMtrM=' 'sha256-47DEQpj8HBSa+/TImW+5JCeuQeRkm5NMpJWZG3hSuFU=' 'sha256-JLEjeN9e5dGsz5475WyRaoA4eQOdNPxDIeUhclnJDCE=' 'sha256-mQyxHEuwZJqpxCw3SLmc4YOySNKXunyu2Oiz1r3/wAE=' 'sha256-OCf+kv5Asiwp++8PIevKBYSgnNLNUZvxAp4a7wMLuKA=' 'sha256-h5LOiLhk6wiJrGsG5ItM0KimwzWQH/yAcmoJDJL//bY=' 'unsafe-inline'; worker-src *.fastly-insights.com < content-type: text/xml; charset=UTF-8 < etag: "oi55x1TAKjY1ZtSDPlquNg" < referrer-policy: origin-when-cross-origin < server: nginx/1.13.9 < accept-ranges: bytes < date: Thu, 02 Feb 2023 12:17:43 GMT < x-served-by: cache-iad-kjyo7100068-IAD, cache-hhn-etou8220042-HHN < x-cache: MISS, HIT < x-cache-hits: 0, 1 < x-timer: S1675340264.727058,VS0,VE1 < vary: Accept-Encoding < strict-transport-security: max-age=31536000; includeSubDomains; preload < x-frame-options: deny < x-xss-protection: 1; mode=block < x-content-type-options: nosniff < x-permitted-cross-domain-policies: none < content-length: 5197 < { [1323 bytes data] * [CONN-0-0][CF-SSL] TLSv1.2 (IN), TLS header, Supplemental data (23): { [5 bytes data] * [CONN-0-0][CF-SSL] TLSv1.2 (IN), TLS header, Supplemental data (23): { [5 bytes data] * [CONN-0-0][CF-SSL] TLSv1.2 (IN), TLS header, Supplemental data (23): { [5 bytes data] 100 5197 100 5197 0 0 17794 0 --:--:-- --:--:-- --:--:-- 17859 * Connection #0 to host pypi.org left intact ```

Additional context In Gentoo we rely heavily on PyPI release feeds to update packages, so this is a major problem causing us to miss package updates.

miketheman commented 1 year ago

This is an interesting, and likely unique use case to this package.

atomicwrites specifically has an out-of-order creation date in the database, after the maintainer removed the project in its entirety. The PyPI admin did their best on a weekend to manually restore packages and history, but the database dates were affected in that process.

This can be seen in the feed response, where:

1.4.0 has a pubDate of Sat, 09 Jul 2022 04:09:09 GMT 1.4.1 has a pubDate of Fri, 08 Jul 2022 18:31:40 GMT

This shows that 1.4.1 was published before 1.4.0 was restored, which is a very rare case indeed.

The XML feed is ordered by release creation date, which is generally correct, since most releases happen in time-order, and this one was a manual override.

Some of solutions I've thought of:

  1. update 1.4.1 timestamp to be after 2022-07-09 04:09:09 - update of one row, changes precise history
  2. update anything not 1.4.1 to be before 2022-07-08 18:31:40 - updates multiple rows, preserves end-user action history

Moving a single version forward:

UPDATE releases
SET created = '2022-07-09 04:10:00'::timestamp
WHERE id =
    (SELECT r.id
     FROM releases r
     JOIN projects p ON p.id = r.project_id
     WHERE p.name = 'atomicwrites'
       AND r.version = '1.4.1'
     LIMIT 1);

Moving all other records forward (riskier, please check my SQL):

UPDATE releases
SET created = created - interval '1 day'
WHERE id in
    (SELECT r.id
     FROM releases r
     JOIN projects p ON p.id = r.project_id
     WHERE p.name = 'atomicwrites'
       AND r.version != '1.4.1');

@pypi/warehouse-admins Thoughts on either approach? My lean is to update the single row for safety purposes, and as the project is deprecated and archived on GitHub, it is unlikely to ever see another release.

miketheman commented 1 year ago

so this is a major problem causing us to miss package updates.

@mgorny this is an assertive claim - has this happened on any project other than this special case?

mgorny commented 1 year ago

Thanks for looking into it. Knowing it isn't a bug in the code and that it should therefore be super-rare makes me feel easier ;-).

mgorny commented 1 year ago

so this is a major problem causing us to miss package updates.

@mgorny this is an assertive claim - has this happened on any project other than this special case?

Yeah, sorry. I really had no way of knowing, so I've made the uneasy assumption that it may be happening randomly.

dstufft commented 1 year ago

It's not exactly unique to this package. If packages ever have multiple ongoing lines of development that they're releasing then the releases being "out of order" like this could happen as well, e.g. if you put Python itself into this, it would look like an interleaving of 2.7 and 3.x versions.

However I think this behavior is correct, the RSS isn't there to get the highest version, it's there so you can follow along and get updates as new versions are released, regardless of what their version number is, So if someone legitimately releases a patch to an older version, if you're polling the XML you should see that as the newest release.

miketheman commented 1 year ago

Based on the description provided, it does indeed seem like everything is working correctly. @mgorny does that make sense? Just because the rss feed shows versions out of specific versioning order, doesn’t meant that the most recent release is the highest version number.

mgorny commented 1 year ago

Yes, thank you.