pypi / warehouse

The Python Package Index
https://pypi.org
Apache License 2.0
3.54k stars 952 forks source link

Packages without releases should not be on /simple #4520

Open cztomsik opened 6 years ago

cztomsik commented 6 years ago

Describe the bug /simple/, list_packages and list_packages_with_serials all return removed packages.

Expected behavior Only packages listed on pypi.org should be returned.

To Reproduce Go to https://pypi.org/simple/ and search for package 0 or rever. These packages are not installable.

Additional context I've quickly skimmed through the code and I think (to my limited python knowledge) that removing project is actually doing the delete so it could be some kind of stale data in database.

ewdurbin commented 6 years ago

For a given package PyPI has Projects and Releases. The examples provided are Projects that were registered that never uploaded a release (previously allowed) or who uploaded releases that were later removed.

We probably need discussion on how if we want to filter them from /simple. cc @dstufft @di

cztomsik commented 6 years ago

So it probably works as intended but I'd still argue that it's little unexpected. I mean if you call list_packages you will get 0 and rever (and probably lot of others) and none of those can be obtained through https://pypi.org/pypi/<pkg>/json

It seems bandersnatch has similar check https://bitbucket.org/pypa/bandersnatch/src/9fa97648f980d25f7c255f6d513da4bc6f6be2aa/src/bandersnatch/package.py?at=default&fileviewer=file-view-default#package.py-119

di commented 6 years ago

I've revised the title a bit here to better describe what is happening.

I don't see any reason to list these packages (i.e. those without any releases) in /simple, they will never be installable, and removing them will make requests to /simple a little more lightweight as well.

cztomsik commented 6 years ago

Awesome, what about those two xmlrpc api calls? list_packages and list_packages_with_serials?

BTW: I could probably help (I've noticed there is docker container) but I'm not sure how to correctly replicate this issue (maybe importing sql of few packages would help if you can export that for me).

ewdurbin commented 6 years ago

Not sure what the implications are of changing the XMLRPC responses. We generally shy away from invasive changes to them. For instance, we'd need to coordinate with projects like https://github.com/pypa/bandersnatch to ensure we're not interfering with their usage https://github.com/pypa/bandersnatch/blob/1d562e857b8b6755acac2341721b2d5e760eb920/src/bandersnatch/master.py#L80-L81

cztomsik commented 6 years ago

Yeah, but we could probably add an option there (include_empty?). It's similar to /simple/

https://github.com/pypa/warehouse/blob/master/warehouse/legacy/api/xmlrpc/views.py#L198 https://github.com/pypa/warehouse/blob/master/warehouse/legacy/api/simple.py#L44

di commented 6 years ago

Yeah, I agree with @ewdurbin, it's probably not worth it to change the behavior of the XML-RPC endpoints at this point in time, but probably something we should consider for the API that will eventually replace them.