Open njsmith opened 1 year ago
See PEP 625. The sdist filename was standardized 2 years ago, so you can parse it. There should be only one dash, since the name and versions should be normalized. There are stragglers, and historical releases won't be fixed, but a new tool should be OK with simply ignoring those -- though it does need to detect them. Apparently the overwhelming majority of legacy filenames contain multiple dashes, so detecting that could be good enough.
Edit: it's PEP 625 -- just in case you're reading the mail notification.
Ah, yeah, that's another option -- skipping any sdist name with multiple dashes. I was assuming that we couldn't drop compat with old non-compliant artifacts, but maybe we could get away with it.
When reading https://pypi.org/simple/cffi, we currently see
cffi-1.0.2-2.tar.gz
and parse it as name:cffi-1.0.2
, version:2
. And then inPackageDB::available_artifacts("cffi")
, we end up filing this under version 2.I don't think we can parse this sdist name in general -- at least without breaking much more common cases like
scikit-learn-1.0.2.tar.gz
. But a very simple thing we could do is, when reading a simple API page, ignore all entries whose name doesn't match the simple API page we're looking at!(I guess we could also get fancier, and try to use the simple API page to bias the sdist name parsing? But I think stuff like
cffi-1.0.2-2.tar.gz
is super rare and we can probably just skip it.)