pypi / warehouse

The Python Package Index
https://pypi.org
Apache License 2.0
3.6k stars 968 forks source link

Metadata version 2.0 not supported, probably unintentionally #17168

Open dnicolodi opened 5 days ago

dnicolodi commented 5 days ago

I was reading the code in warehouse.forklift.metadata and I found that it tries to support metadata version 2.0:

https://github.com/pypi/warehouse/blob/49d17eef4e8029d716723aa11039fcfe0cf5df73/warehouse/forklift/metadata.py#L35-L35 https://github.com/pypi/warehouse/blob/49d17eef4e8029d716723aa11039fcfe0cf5df73/warehouse/forklift/metadata.py#L73-L86

However this additional validation function is called the metadata validation in the metadata parser in packaging has already been done:

https://github.com/pypi/warehouse/blob/49d17eef4e8029d716723aa11039fcfe0cf5df73/warehouse/forklift/metadata.py#L65-L68

packaging does not support metadata version 2.0, which technically does not exist as a standard. Therefore trying to parse metadata version 2.0 results in an exception being raised:

>>> import warehouse.forklift.metadata
>>> warehouse.forklift.metadata.parse(b'''\
... Metadata-Version: 2.0
... Name: foo
... Version: 1.2.3
... ''')

    | packaging.metadata.InvalidMetadata: '2.0' is not a valid metadata version

Apparently, supporting metadata version 2.0 is not necessary anymore, otherwise AFAIK if someone would have tried to upload to PyPI a package using metadata 2.0, they would have encountered an error. However, if it is decided to keep metadata 2.0 support, monkeypatching packaging may be the easiest way forward. This seems to work:

>>> import packaging.metadata
>>> packaging.metadata._VALID_METADATA_VERSIONS = ['1.0', '1.1', '1.2', '2.0', '2.1', '2.2', '2.3', '2.4']
>>> import warehouse.forklift.metadata
>>> warehouse.forklift.metadata.parse(b'''\
... Metadata-Version: 2.0
... Name: foo
... Version: 1.2.3
... ''')
<packaging.metadata.Metadata object at 0x10228c2d0>

but I haven't verified that this approach works as intended in all aspects.

di commented 5 days ago

Indeed, looks like we have inadvertently dropped support for Metadata 2.0:

SELECT
    FORMAT_DATE('%Y-%m', upload_time) AS upload_month,
    count(*) AS count
  FROM
    `bigquery-public-data.pypi.distribution_metadata`
  WHERE metadata_version = '2.0'
   AND DATE(upload_time) >= DATE_SUB(CURRENT_DATE(), INTERVAL 24 MONTH)
  GROUP BY 1
ORDER BY 1
upload_month count
2022-11 23  
2022-12 266  
2023-01 132  
2023-02 106  
2023-03 159  
2023-04 219  
2023-05 174  
2023-06 157  
2023-07 95  
2023-08 196  
2023-09 115  
2023-10 143  
2023-11 182  
2023-12 61  
2024-01 177  
2024-02 51  
2024-03 75

(note, nothing after March 2024 which correlates with https://github.com/pypi/warehouse/pull/15631)

I think, given the low historic counts here & lack of outcry from users, we can just consider this deprecated and drop it from SUPPORTED_METADATA_VERSIONS rather than special-case it.