pulp / pulp_python

A Pulp plugin to support Python packages
GNU General Public License v2.0
37 stars 76 forks source link

Unable to set up pulp-python sync from JFrog local pypi repository #669

Open grzleadams opened 4 months ago

grzleadams commented 4 months ago

Version Deployed via Operator:

{                                                                                                                                                                                              
  "versions": [              
    {                                                                                                 
      "component": "core",                                                                            
      "version": "3.49.1",                                                                            
      "package": "pulpcore",                   
      "module": "pulpcore.app",                                                                       
      "domain_compatible": true
    },                                                                                                                                       
    {                                                                                                 
      "component": "python",
      "version": "3.11.0",
      "package": "pulp-python",
      "module": "pulp_python.app",
      "domain_compatible": false
    },

Describe the bug I set up a Pulp python remote pointing at a local JFrog pypi repository ("url": "https://<redacted>/artifactory/api/pypi/pypi-local/simple"), providing valid credentials in the process (with username and password), and linked it with a Pulp python repository and distribution. However, it appears that the credentials are not being passed during the requests when syncing, or the URL is being malformed, or something. From JFrog logs (note the non_authenticated_user and 401):

2024-05-14T16:33:15.472Z|<redacted>|<redacted>|non_authenticated_user|GET|/api/pypi/pypi-local/simple/pypi/<redacted>/json|401|-1|0|1|bandersnatch/6.1.0 (cpython 3.9.18-final0, Linux x86_64) (aiohttp 3.9.3)

For what it's worth, that URL also looks strange... I would expect .../simple/<redacted>/json, not .../simple/pypi/<redacted>/json. It's worth noting that Artifactory requires the username/password to be included in the URL but Pulp prevents that:

Error: {"url":["The remote url contains username or password. Please use remote username or password instead."]}

To Reproduce Steps to reproduce the behavior:

  1. Create a local pypi repository in JFrog.
  2. Create a Pulp remote pointing at the JFrog pypi repository (and then set up a Pulp repository and distribution as needed to sync it).
  3. Attempt an authenticated sync.

Expected behavior The sync should happen successfully.

Additional context N/A

gerrod3 commented 4 months ago

I moved the issue to the pulp_python repository since the issue is with this plugin.

I would try to remove the simple/ part from the remote url and retry the sync. I not entirely sure how JFrog sets up their pypi repository, but assuming that https://<host>/artifactory/pypi/pypi-local/ is the base index page for your repository then this should be the url you use for your remote.

It is still possible we might not fully support authenticated syncs, atleast through normal use of the remote's username and password field. If you are still getting 401s, try using the https://<username>:<password>@<host>/.../ url form.

grzleadams commented 4 months ago

Unfortunately, I did try removing simple/ (and a bunch of other permutations of the URL) and nothing worked. I did try to set the URL to include the credentials but Pulp wouldn't let me (I get the url contains username or password error I mentioned before). Is there a way to work around that/set it directly on the remote without using the CLI/API (I assume the validation happens either way)?

gerrod3 commented 4 months ago

If you want to directly set the url on the remote without validation you can do it through the shell. On the pulp instance run pulpcore-manager shell_plus, this should bring up a python shell with some classes already imported. Try:

py_remote = Remote.objects.get(name="your_python_remote_name")
py_remote.url = "https://<username>:<password>@<host>/artifactory/pypi/pypi-local/"
py_remote.save()

This should bypass the validation done through the API.

grzleadams commented 4 months ago

Is shell_plus available in 3.49.1/the minimal image?

Unknown command: 'shell_plus'. Did you mean shell?
gerrod3 commented 4 months ago

It might not be. Instead use its suggestion pulpcore-manager shell and then add this line to the top: from pulpcore.app.models import Remote.

grzleadams commented 4 months ago

Setting the credentials in the URL seems to have worked, so we're not getting the unauthenticated user business anymore.

2024-05-16T18:01:01.122Z|<thread_id>|<ipaddress>|<authenticated_user>|GET|/api/pypi/pypi-local/simple/pypi/<module_name>/json|404|-1|0|3|bandersnatch/6.1.0 (cpython 3.9.18-final0, Linux x86_64) (aiohttp 3.9.3)

The 404 appears to be related to both simple/pypi and /json; fixing both gives an HTML response that lists all available module versions. Are those two things required by the PyPI API spec?

gerrod3 commented 4 months ago

Both /simple/ and /pypi/<package_name>/json are PyPI APIs. /simple/ is used by pip for package installs and /pypi/* is used by bandersnatch (the tool Pulp uses under the hood) for syncing. When specifying the url for syncing you should only use the base-url of your index, no /simple/ or /pypi/* as bandersnatch will add the /pypi ending itself.

grzleadams commented 4 months ago

Do you know if bandersnatch will follow redirects? Apparently JFrog is doing something with their reverse proxy that requires it (for example, to just curl the simple index you need -L.

gerrod3 commented 4 months ago

It should follow redirects, and same with pip as well.

grzleadams commented 4 months ago

I looked through the worker logs and it looks like Pulp finds the package list (there are in fact 26 packages to sync) but hits .netrc errors when trying to pull them:

pulp []: pulpcore.tasking.tasks:INFO: Starting task <task_id>
pulp []: bandersnatch:INFO: Initialized release plugin blocklist_release, filtering []
pulp []: bandersnatch.mirror:INFO: Syncing with https://<url>/artifactory/api/pypi/pypi-local.
pulp []: pulp_python.app.tasks.sync:INFO: Attempt 0 to get package list from https://<url>/artifactory/api/pypi/pypi-local
pulp []: pulp_python.app.tasks.sync:INFO: Syncing all packages.
pulp []: aiohttp.client:WARNING: Could not read .netrc file: [Errno 2] No such file or directory: '.fake-netrc'
pulp []: pulp_python.app.tasks.sync:INFO: Attempt 1 to get package list from https://<url>/artifactory/api/pypi/pypi-local
pulp []: pulp_python.app.tasks.sync:INFO: Syncing all packages.
pulp []: aiohttp.client:WARNING: Could not read .netrc file: [Errno 2] No such file or directory: '.fake-netrc'
pulp []: pulp_python.app.tasks.sync:INFO: Attempt 2 to get package list from https://<url>/artifactory/api/pypi/pypi-local
pulp []: pulp_python.app.tasks.sync:INFO: Syncing all packages.
pulp []: aiohttp.client:WARNING: Could not read .netrc file: [Errno 2] No such file or directory: '.fake-netrc'
pulp []: pulp_python.app.tasks.sync:INFO: Failed to get package list using XMLRPC, trying parse simple page.
pulp []: bandersnatch.mirror:INFO: No project filters are enabled. Skipping filtering
pulp []: pulp_python.app.tasks.sync:INFO: 26 packages to sync.
pulp []: bandersnatch.mirror:INFO: No metadata filters are enabled. Skipping metadata filtering
pulp []: bandersnatch.mirror:INFO: No release file filters are enabled. Skipping release file filtering
pulp []: bandersnatch.package:INFO: Fetching metadata for package: <module> (serial 0)
pulp []: aiohttp.client:WARNING: Could not read .netrc file: [Errno 2] No such file or directory: '.fake-netrc'
pulp []: bandersnatch.package:INFO: <module> no longer exists on PyPI
<snip>
pulp []: pulpcore.tasking.tasks:INFO: Task completed <task_id>
gerrod3 commented 4 months ago

Those Could not read .netrc file: warnings are harmless, they were fixed in pulp_python 3.11.1, but they shouldn't affect the sync.

Can you check the output of the sync task? The logs say it completed, so it should give info on how many packages it synced. pulp task show --href <task_href> or --uuid <task_id>.

If the number of synced packages is zero then can you try to curl https://<url>/artifactory/api/pypi/pypi-local/pypi/<package_name>/json and see if it responds with a json of that package's metadata? This should be the endpoint that the sync is trying for each package it is syncing.

gerrod3 commented 3 months ago

@grzleadams Did you ever get the sync to work?

grzleadams commented 3 months ago

No, we were in a bit of a time crunch so I just downloaded all the files and added them to Pulp manually.

gerrod3 commented 3 months ago

I see. Well when you have time, I am willing to continue helping out to resolve this issue, else we can close it if no longer needed.