pypi / warehouse

The Python Package Index
https://pypi.org
Apache License 2.0
3.58k stars 965 forks source link

Search API to search over keywords #3436

Open mitar opened 6 years ago

mitar commented 6 years ago

With planned deprecation of XML-RPC API it seems there will no way to search packages by their keywords. So I would like to do a feature request for this for the new API.

brainwane commented 6 years ago

@mitar Thank you for bringing this up! We will only remove the XML-RPC API when its functionality is covered by other new APIs, so I have put this in a future milestone, and I've tagged it so people will see it when looking at API issues. Thanks again.

yuvalreches commented 6 years ago

Hey @brainwane Looking at your roadmap and this thread, do I understand correctly that the old XML-RPC API will remain operational in the near future under pypi.org as well?

Currently when performing pip search <packageName> under pypi.python.org we get HTTP 302 and being redirected to pypi.python.org/pypi

However when performing the same (pip search) under pypi.org we get HTTP error 404 while getting http://pypi.org/RPC2

Isn't it supposed to be redirected to the old API for now? Thanks

di commented 6 years ago

@yuvalreches:

Looking at your roadmap and this thread, do I understand correctly that the old XML-RPC API will remain operational in the near future under pypi.org as well?

Yes, the XML-RPC API will remain for now.

Currently when performing pip search <packageName> under pypi.python.org we get HTTP 302 and being redirected to pypi.python.org/pypi

Correct, the 302 redirect from https://pypi.python.org/ to https://pypi.python.org/pypi doesn't exist for pypi.org.

However when performing the same (pip search) under pypi.org we get HTTP error 404 while getting http://pypi.org/RPC2

I'm not sure exactly what index URL you're using here, but you should be using https://pypi.org/pypi:

$ pip search foobar -vvv --index https://pypi.org/pypi
Starting new HTTPS connection (1): pypi.org
https://pypi.org:443 "POST /pypi HTTP/1.1" 200 272
foobar (1.1)         - This is the FooBar  project. (foo is taken at PyPI -
                       hahaha)
django-foobar (1.0)  - Super awesome portable module for Django
yuvalreches commented 6 years ago

Hey @di

I get your point, but why redirect to an endpoint that doesn't exist?

Would it be possible to have the same redirect as https://pypi.python.org/ has for the time being (until XML-RPC v1 is deprecated)?

Meaning redirect from https://pypi.org/ to https://pypi.org/pypi instead of https://pypi.org/RPC2

It would help us a lot

di commented 6 years ago

I get your point, but why redirect to an endpoint that doesn't exist?

I'm not sure I follow. There is nothing on pypi.org that is redirecting to an endpoint that doesn't exist. The base URL for link is just misconfigured, e.g. it's using 'https://pypi.org' + '/' + 'RPC2' instead of 'https://pypi.org/pypi' + '/' + 'RPC2', where 'RPC2' is just the search query.

Would it be possible to have the same redirect as https://pypi.python.org/ has for the time being (until XML-RPC v1 is deprecated)?

Meaning redirect from https://pypi.org/ to https://pypi.org/pypi instead of https://pypi.org/RPC2

I don't think this is necessary and we're unlikely to add it. Your client should just use the correct search index instead.

yuvalreches commented 6 years ago

Sorry, my bad. You are right - the redirect to RPC2 happens upon pip search <package> -i https://pypi.org instead of using pip search ... -i pypi.org/pypi It does look strange to me to have a redirect to a page that doesn't exist.

I'll explain our scenario: JFrog Artifactory performs the search to PyPI in case a repository is pointing at it. The registry url is configured by default (for the past several years) as https://pypi.python.org and upon each search - the request goes to this url.

When sending the search request (POST method) we rely on the redirect and perform the search on the path redirected to.

Now with the new registry version we don't get that redirect and the search fails.

Easiest way to see it: curl -XPOST -i https://pypi.python.org results inHTTP/2 302 location: https://pypi.python.org/pypi Which is the desired behaviour.

However in the warehouse curl -XPOST -i https://pypi.org results inHTTP/2 405

Setting the same redirect in the warehouse will allow Artifactory users to keep using the search function, without anything breaks when the full redirect of pypi.python.org to pypi.org takes place.

di commented 6 years ago

@yuvalreches After chatting the other maintainers, we realized that the attempt to use /RPC2 is standard behavior for most XML-RPC clients when the root URL does not support XML-RPC. We decided to add this endpoint to Warehouse to duplicate the /pypi endpoint, which should fix your issues once merged (#3594).

However, Artifactory should probably still attempt to use the correct XML-RPC endpoints (either /pypi or /RPC2) when using the XML-RPC API, rather than allowing your client to test the root domain, then fall back on /RPC2 as this is technically generating unnecessary requests that are likely slowing down the responsiveness of these requests for your users.

yuvalreches commented 6 years ago

Thank you @di

Currently we want to make sure Artifactory instances won't be broken due to the changes implemented in Warehouse, and the /pypi redirect will assure that :)

I see your PR is already merged, when can we expect to see the change in pypi.org?

We sure have that in our roadmap. Also please feel free to reach out again when a RPC2 beta is operational so we can implement the changes needed in Artifactory.

di commented 6 years ago

Currently we want to make sure Artifactory instances won't be broken due to the changes implemented in Warehouse, and the /pypi redirect will assure that.

I think we are having communication issues. Let me be clear: there will not be a redirect from / to /pypi in Warehouse.

What we have added is an endpoint at /RPC2 that your client should be able to use. We have not added any redirects.

I see your PR is already merged, when can we expect to see the change in pypi.org?

It is live now.

yuvalreches commented 6 years ago

Artifactory's code only relies on the Location header that is returned upon POST pypi.org (don't mind if its /pypi or /rpc2)

Would it be possible to change the response of such requests? I see it still returns HTTP/2 405 instead of HTTP/2 302 location: https://pypi.org/rpc2

Such redirect (as exists on pypi.python.org, however to /pypi) will assure nothing will break on our end.

mitar commented 6 years ago

Is new API location really the same as old one? I am sure I was getting results before for the following query, but now it is returning empty:

client = xmlrpc.ServerProxy('https://pypi.python.org/pypi')
client.search({'keywords': 'd3m_primitive'})

Or do packages have to be (re)published for them to be visible through the new API location?

ewdurbin commented 6 years ago

@mitar we're tracking issues with our search index becoming empty (bug) at #3746, and issues with search oddities in general at #3717

di commented 6 years ago

More or less blocked on #284.