pulp / pulp_rpm

RPM support for Pulp Platform
https://pulpproject.org/pulp_rpm/
GNU General Public License v2.0
48 stars 124 forks source link

Remote Sync Strips Trailing Forward Slash - Results in 404 #3240

Open joey-grant opened 1 year ago

joey-grant commented 1 year ago

Summary

I have 2 pulp servers, one of which serves as a primary (where I control package promotion, etc) and the other simply syncs repositories from the primary. I am using nginx as my reverse proxy and am also utilizing certgaurd as well (though I don't think this has impact here). The problem, is that my secondary server's rpm remote points to the primary's distribution and includes a trailing slash, but pulp rpm sync seems to be stripping that slash away, resulting in 404s.

Steps to reproduce

[root@primary ~]# pulp rpm distribution show --name application-eng
{
  "pulp_href": "/pulp/api/v3/distributions/rpm/rpm/ea16f53b-8a78-4395-8935-eaa7d96f06c7/",
  "pulp_created": "2023-08-09T15:28:09.236805Z",
  "base_path": "application-x86_64-eng",
  "base_url": "https://primary.env.company.com/pulp/content/application-x86_64-eng/",
  "content_guard": "/pulp/api/v3/contentguards/certguard/x509/b0fa42ca-5eff-4c0d-a53c-dfadf9268ae5/",
  "pulp_labels": {},
  "name": "application-eng",
  "repository": null,
  "publication": "/pulp/api/v3/publications/rpm/rpm/5f217017-ab21-4ed6-8b5e-96afab630321/"
}
[root@secondary ~]# pulp rpm remote show --name application-uri-eng
{
  "pulp_href": "/pulp/api/v3/remotes/rpm/rpm/8c4ee2f0-e095-4967-bc48-fa1a041d37e2/",
  "pulp_created": "2023-09-01T14:12:29.471030Z",
  "name": "application-uri-eng",
  "url": "https://primary.env.company.com/pulp/content/application-x86_64-eng/",
  "ca_cert": <redacted>,
  "client_cert": <redacted>
  "tls_validation": true,
  "proxy_url": null,
  "pulp_labels": {},
  "pulp_last_updated": "2023-09-01T14:29:33.775680Z",
  "download_concurrency": null,
  "max_retries": null,
  "policy": "on_demand",
  "total_timeout": null,
  "connect_timeout": null,
  "sock_connect_timeout": null,
  "sock_read_timeout": null,
  "headers": null,
  "rate_limit": null,
  "hidden_fields": [
    {
      "name": "client_key",
      "is_set": true
    },
    {
      "name": "proxy_username",
      "is_set": false
    },
    {
      "name": "proxy_password",
      "is_set": false
    },
    {
      "name": "username",
      "is_set": false
    },
    {
      "name": "password",
      "is_set": false
    }
  ],
  "sles_auth_token": null
}

[root@secondary ~]# pulp rpm repository show --name application-eng
{
  "pulp_href": "/pulp/api/v3/repositories/rpm/rpm/36a955a4-f5d6-4b48-947f-46edbe396201/",
  "pulp_created": "2023-09-01T14:12:31.994781Z",
  "versions_href": "/pulp/api/v3/repositories/rpm/rpm/36a955a4-f5d6-4b48-947f-46edbe396201/versions/",
  "pulp_labels": {},
  "latest_version_href": "/pulp/api/v3/repositories/rpm/rpm/36a955a4-f5d6-4b48-947f-46edbe396201/versions/0/",
  "name": "application-eng",
  "description": "Mirror for: https://primary.env.company.com/pulp/content/application-x86_64-eng/",
  "retain_repo_versions": null,
  "remote": "/pulp/api/v3/remotes/rpm/rpm/8c4ee2f0-e095-4967-bc48-fa1a041d37e2/",
  "autopublish": true,
  "metadata_signing_service": null,
  "retain_package_versions": 0,
  "metadata_checksum_type": null,
  "package_checksum_type": null,
  "gpgcheck": 0,
  "repo_gpgcheck": 0,
  "sqlite_metadata": false
}

[root@secondary ~]# pulp rpm repository sync --name application-eng
Started background task /pulp/api/v3/tasks/676d1531-ba40-43f6-9c76-48bb3f4c4f87/
Error: Task /pulp/api/v3/tasks/676d1531-ba40-43f6-9c76-48bb3f4c4f87/ failed: '404, message='Not Found', url=URL('https://primary.env.company.com/pulp/content/application-x86_64-eng')'

[root@secondary ~]# curl --key pulp.key --cert pulp.pem https://primary.env.company.com/pulp/content/application-x86_64-eng/

<html>
<head><title>Index of /pulp/content/application-x86_64-eng/</title></head>
<body bgcolor="white">
<h1>Index of /pulp/content/application-x86_64-eng/</h1>
<hr><pre><a href="../">../</a>
<a href="Packages/">Packages/</a>                                                                                           29-Jun-2022 03:54
<a href="config.repo">config.repo</a>
<a href="repodata/">repodata/</a>                                                                                           25-Aug-2023 15:39
</pre><hr></body>
</html>

Expected behavior

I expected the call to pulp rpm repository sync on the secondary to have retained the tailing forward slash as defined in the rpm remote.

Stacktrace/Error log

[root@primary ~]# tail -n2 /var/log/nginx/access.log
XXX.XXX.XXX.XXX - - [01/Sep/2023:15:19:58 +0000] "GET /pulp/content/application-x86_64-eng HTTP/1.1" 404 14 "-" "pulpcore/3.22.1 (cpython 3.8.11-final0, Linux x86_64) (aiohttp 3.8.1)"
XXX.XXX.XXX.XXX - - [01/Sep/2023:15:21:33 +0000] "GET /pulp/content/application-x86_64-eng/ HTTP/1.1" 200 649 "-" "curl/7.29.0"

Pulp and pulp-cli version info

[root@primary ~]# pulp status
{
  "versions": [
    {
      "component": "core",
      "version": "3.22.1",
      "package": "pulpcore"
    },
    {
      "component": "rpm",
      "version": "3.19.7",
      "package": "pulp-rpm"
    },
...
[root@secondary ~]# pulp --version
pulp3 command line interface, version 0.19.2

Additonal context

mdellweg commented 1 year ago

It looks to me like the remote is configured correctly. And since the trailing slash is part of the url there, I cannot see that the cli is to blame either. Would you be able to provide a full stacktrace of this failure? You should get that either from pulp task show or from the server logs.

joey-grant commented 1 year ago

Sure thing, thanks for looking at this with me.

[root@secondary ~]# pulp task show --href /pulp/api/v3/tasks/676d1531-ba40-43f6-9c76-48bb3f4c4f87/
{
  "pulp_href": "/pulp/api/v3/tasks/676d1531-ba40-43f6-9c76-48bb3f4c4f87/",
  "pulp_created": "2023-09-01T15:19:58.646463Z",
  "state": "failed",
  "name": "pulp_rpm.app.tasks.synchronizing.synchronize",                                                                                     "logging_cid": "6cc165caf2204c5b97bf55399a0b057b",                                                                                          "started_at": "2023-09-01T15:19:58.769975Z",                                                                                                                   "finished_at": "2023-09-01T15:19:59.018980Z",
  "error": {
    "traceback": "  File \"/usr/local/lib/pulp/lib64/python3.8/site-packages/pulpcore/tasking/pulpcore_worker.py\", line 444, in _perform_task\n    result = func
(*args, **kwargs)\n  File \"/usr/local/lib/pulp/lib64/python3.8/site-packages/pulp_rpm/app/tasks/synchronizing.py\", line 486, in synchronize\n    remote_url = f
etch_remote_url(remote, url)\n  File \"/usr/local/lib/pulp/lib64/python3.8/site-packages/pulp_rpm/app/tasks/synchronizing.py\", line 305, in fetch_remote_url\n
  remote_url = fetch_mirror(remote)\n  File \"/usr/local/lib/pulp/lib64/python3.8/site-packages/pulp_rpm/app/tasks/synchronizing.py\", line 254, in fetch_mirror\
n    result = downloader.fetch()\n  File \"/usr/local/lib/pulp/lib64/python3.8/site-packages/pulpcore/download/base.py\", line 175, in fetch\n    return done.pop
().result()\n  File \"/usr/local/lib/pulp/lib64/python3.8/site-packages/pulpcore/download/http.py\", line 273, in run\n    return await download_wrapper()\n  Fil
e \"/usr/local/lib/pulp/lib64/python3.8/site-packages/backoff/_async.py\", line 151, in retry\n    ret = await target(*args, **kwargs)\n  File \"/usr/local/lib/p
ulp/lib64/python3.8/site-packages/pulpcore/download/http.py\", line 258, in download_wrapper\n    return await self._run(extra_data=extra_data)\n  File \"/usr/lo
cal/lib/pulp/lib64/python3.8/site-packages/pulp_rpm/app/downloaders.py\", line 117, in _run\n    self.raise_for_status(response)\n  File \"/usr/local/lib/pulp/li
b64/python3.8/site-packages/pulp_rpm/app/downloaders.py\", line 102, in raise_for_status\n    response.raise_for_status()\n  File \"/usr/local/lib/pulp/lib64/python3.8/site-packages/aiohttp/client_reqrep.py\", line 1004, in raise_for_status\n    raise ClientResponseError(\n",
    "description": "404, message='Not Found', url=URL('https://primary.env.company.com/pulp/content/application-x86_64-eng')"
  },
  "worker": "/pulp/api/v3/workers/0cd785be-c0ad-4415-89e9-6a7c9d6161fb/",
  "parent_task": null,
  "child_tasks": [],
  "task_group": null,
  "progress_reports": [],
  "created_resources": [],
  "reserved_resources_record": [
    "/pulp/api/v3/repositories/rpm/rpm/36a955a4-f5d6-4b48-947f-46edbe396201/",
    "shared:/pulp/api/v3/remotes/rpm/rpm/8c4ee2f0-e095-4967-bc48-fa1a041d37e2/"
  ]
}
mdellweg commented 1 year ago

@dralley Does this look familiar to you? Would you agree reassigning this issue to pulp_rpm?

dralley commented 1 year ago

I'm fine with reassigning it to pulp_rpm. I agree that it probably isn't a CLI issue, at least.

joey-grant commented 1 year ago

It appears that this line of code in synchronizing.py is the culprit. What is the purpose of stripping out the trailing forward slash deliberately?

downloader = remote.get_downloader(url=remote.url.rstrip("/"), urlencode=False)

ipanova commented 1 year ago

What happens here is that your URL has been identified as mirrorlist, which is not true https://github.com/pulp/pulp_rpm/blob/748b3dde057bfc90623e186842b14538a0162049/pulp_rpm/app/tasks/synchronizing.py#L301

Can you share with us the contents of /repodata? Is there repomd.xml present?

dralley commented 12 months ago

Related: https://github.com/pulp/pulpcore/issues/3173

pedro-psb commented 8 months ago

Hello @munkey01, @ipanova has a point, this error is only raised if there is an error finding repodata/repomd.xml (see the context). It would be really helpful to know if repomd.xml is there or not.

Another possible reason for get_repomd_file not "finding" the file (aka raising ClientResponseError) would be something related to the downloader configuration. It's a long shot, but I can look into that if the repomd.xml file is confirmed to be available at repodata/repomd.xml.

I think this is not related to slashes, although it may "look like" at first sight.

ps: just additional information, the 404 is expected when trying to get https://primary.env.company.com/pulp/content/application-x86_64-eng (no slash) before pulpcore 3.40.0, but that not meaningfully because the sync task only tries to hit this because its thinking its a mirrorlist in the first place, as Ina already said.