pypi / warehouse

The Python Package Index
https://pypi.org
Apache License 2.0
3.59k stars 966 forks source link

pip3 search finds only libraries with whitespace/dash after search term #5506

Open meridsa opened 5 years ago

meridsa commented 5 years ago

Environment

Description Searching a term with pip3 search [foo] will only return libraries or descriptions with [foo] followed by whitespace or dash. Whilst any prefixes or upper/lower combination will be returned. I am wondering if this is a result of the terminal adding a whitespace after any command being entered.

Expected behavior Returning all libraries that include the term in anyway in the description or name.

How to Reproduce Search any term you know is included/vital in a library but does not end in the term. Ex: pip3 search wx -> fails to return wxPython

screenshot 2019-03-01 at 14 34 00

screenshot 2019-03-01 at 14 34 41

This issue was raised in pip repository, was told this was the more appropriate repo for raising this issue.

yeraydiazdiaz commented 5 years ago

For reference: https://github.com/pypa/pip/issues/6309

di commented 5 years ago

Seems like the actual issue here is that they query is for a package for which the package name on PyPI has mixed case. This is reproducible for pip search django (Django) and pip search sqlalchemy (SQLAlchemy) but not for pip search tensorflow, pip search twine or pip search pip, etc.

yeraydiazdiaz commented 5 years ago

I've had a look at this and it seems pip search is partially to blame for this behaviour.

The search command adds makes an XMLRPC request with a spec passing the term as the name and summary fields along with an or parameter (https://github.com/pypa/pip/blob/master/src/pip/_internal/commands/search.py#L65).

In warehouse the procedure is using a bool query with a should clause which does return an flavor of logical OR but ranks the results by score.

Seems the fact that "derived" packages tend to include the original framework's name in the summary makes them rank higher in the results and, if there's enough of them, push the upstream package out of the top 100 results.

Given the last commit for the line is 5 years old pip has been including the summary field for quite some time. Is this something that's different from the legacy implementation of the XMLRPC search? Is it worth recreating that behaviour?

/cc @pfmoore @pradyunsg

stephen-dexda commented 5 years ago

Behaviour seems to be just that pip search only finds "whole word" matches, where a "word" is a string delimited by spaces or hyphens (but not e.g. underscores - pip search prometheus will not find prometheus_client).

McSinyx commented 4 years ago

Coming from the pip's issue, is it sensible for warehouse to provide a wildcard search API, e.g. using regex or wildcard search term? I imagine that then pip search can use this new query for the name field and maybe stick with the old match for summary.

socketpair commented 4 years ago
$ pip3 search uring
io-uring (0.0.1)               - This is a light-weight python wrapper around "io_uring" linux library.
pycopy-cpython-ure (0.2)       - Pycopy module ure ported to CPython
micropython-cpython-ure (0.1)  - MicroPython module ure ported to CPython

$ pip3 search liburing
liburing (2020.7.13)  - This is a Python + CFFI wrapper around Liburing C library, which is a helper to setup and tear-down io_uring
                        instances.

You know what? I discovered liburing PIP package only using google! Because neither pip website nor pip search found it!

Much more interesting thing, is that website returns 14 results for the same query. WHY.

socketpair commented 4 years ago

Another question is about pycopy-cpython-ure. It DOES NOT contain uring. What's happening ?!