pulp / pulp_container

Pulp Container Registry
https://docs.pulpproject.org/pulp_container/
GNU General Public License v2.0
23 stars 44 forks source link

Replication of a pull through cache fails #1731

Closed PotentialIngenuity closed 1 month ago

PotentialIngenuity commented 2 months ago

Version old version: 3.53.0 new version: 3.57.0

Describe the bug I upgraded pulp today and replication started to fail for containers. It had previously worked and I can still see the content on both pulp servers.

I initiate a replication using the https://localhost{{ upstream_href }}replicate/ endpoint and receive a 401 error for all of the container repositories present.

pulp [4bb148935ef14c1f96c97a89455640b0]: pulpcore.tasking.tasks:INFO: Starting task 019133df-9b4f-7309-b537-343a4bf604b8
pulp [4bb148935ef14c1f96c97a89455640b0]: pulp_container.app.tasks.synchronize:INFO: Synchronizing: repository=docker-cache/library/redis remote=docker-cache/library/redis
Giving up download_wrapper(...) after 1 tries (aiohttp.client_exceptions.ClientResponseError: 401, message='Unauthorized', url=URL('https://repo.company.com/v2/'))
pulp [4bb148935ef14c1f96c97a89455640b0]: backoff:ERROR: Giving up download_wrapper(...) after 1 tries (aiohttp.client_exceptions.ClientResponseError: 401, message='Unauthorized', url=URL('https://repo.company.com/v2/'))
pulp [4bb148935ef14c1f96c97a89455640b0]: pulp_container.app.downloaders:INFO: Updating bearer token
Backing off download_wrapper(...) for 0.3s (aiohttp.client_exceptions.ClientResponseError: 500, message='Internal Server Error', url=URL('https://repo.company.com/pulp/container/docker-cache/library/redis/blobs/sha256:de26bac5fc005b26a0b5d3d4444e6ade67c19f941c2b30c842d56824547ff0b4'))
pulp [4bb148935ef14c1f96c97a89455640b0]: backoff:INFO: Backing off download_wrapper(...) for 0.3s (aiohttp.client_exceptions.ClientResponseError: 500, message='Internal Server Error', url=URL('https://repo.company.com/pulp/container/docker-cache/library/redis/blobs/sha256:de26bac5fc005b26a0b5d3d4444e6ade67c19f941c2b30c842d56824547ff0b4'))
Backing off download_wrapper(...) for 0.9s (aiohttp.client_exceptions.ClientResponseError: 500, message='Internal Server Error', url=URL('https://repo.company.com/pulp/container/docker-cache/library/redis/blobs/sha256:347ce90d87350618fe8d16afa0ad30168823ba61b78ae56f06e6018227c1d2eb'))
pulp [4bb148935ef14c1f96c97a89455640b0]: backoff:INFO: Backing off download_wrapper(...) for 0.9s (aiohttp.client_exceptions.ClientResponseError: 500, message='Internal Server Error', url=URL('https://repo.company.com/pulp/container/docker-cache/library/redis/blobs/sha256:347ce90d87350618fe8d16afa0ad30168823ba61b78ae56f06e6018227c1d2eb'))
Backing off download_wrapper(...) for 0.1s (aiohttp.client_exceptions.ClientResponseError: 500, message='Internal Server Error', url=URL('https://repo.company.com/pulp/container/docker-cache/library/redis/blobs/sha256:b695a3453560aacd42060f43ccf1cbd7d20412f50126ca6a469af38d3fe1e5b1'))
pulp [4bb148935ef14c1f96c97a89455640b0]: backoff:INFO: Backing off download_wrapper(...) for 0.1s (aiohttp.client_exceptions.ClientResponseError: 500, message='Internal Server Error', url=URL('https://repo.company.com/pulp/container/docker-cache/library/redis/blobs/sha256:b695a3453560aacd42060f43ccf1cbd7d20412f50126ca6a469af38d3fe1e5b1'))
Backing off download_wrapper(...) for 0.8s (aiohttp.client_exceptions.ClientResponseError: 500, message='Internal Server Error', url=URL('https://repo.company.com/pulp/container/docker-cache/library/redis/blobs/sha256:0f96f108f0685fbb71209ed9fbf7aa42f8ddc7c51402664f9e0e1d0931514373'))
pulp [4bb148935ef14c1f96c97a89455640b0]: backoff:INFO: Backing off download_wrapper(...) for 0.8s (aiohttp.client_exceptions.ClientResponseError: 500, message='Internal Server Error', url=URL('https://repo.company.com/pulp/container/docker-cache/library/redis/blobs/sha256:0f96f108f0685fbb71209ed9fbf7aa42f8ddc7c51402664f9e0e1d0931514373'))
Backing off download_wrapper(...) for 0.4s (aiohttp.client_exceptions.ClientResponseError: 500, message='Internal Server Error', url=URL('https://repo.company.com/pulp/container/docker-cache/library/redis/blobs/sha256:a1364f37eb38f24d8321c8e71a8a905efbb35a7d51418f7b2e2e733b39a02ea8'))
pulp [4bb148935ef14c1f96c97a89455640b0]: backoff:INFO: Backing off download_wrapper(...) for 0.4s (aiohttp.client_exceptions.ClientResponseError: 500, message='Internal Server Error', url=URL('https://repo.company.com/pulp/container/docker-cache/library/redis/blobs/sha256:a1364f37eb38f24d8321c8e71a8a905efbb35a7d51418f7b2e2e733b39a02ea8'))
Backing off download_wrapper(...) for 0.6s (aiohttp.client_exceptions.ClientResponseError: 500, message='Internal Server Error', url=URL('https://repo.company.com/pulp/container/docker-cache/library/redis/blobs/sha256:fd1a2ab224be08d16bd6af65b06e11bbb13159d8ad4d9e80ceb5688adb50ce47'))
pulp [4bb148935ef14c1f96c97a89455640b0]: backoff:INFO: Backing off download_wrapper(...) for 0.6s (aiohttp.client_exceptions.ClientResponseError: 500, message='Internal Server Error', url=URL('https://repo.company.com/pulp/container/docker-cache/library/redis/blobs/sha256:fd1a2ab224be08d16bd6af65b06e11bbb13159d8ad4d9e80ceb5688adb50ce47'))
Giving up download_wrapper(...) after 1 tries (aiohttp.client_exceptions.ClientResponseError: 404, message='Not Found', url=URL('https://repo.company.com/pulp/container/docker-cache/library/redis/blobs/sha256:3f60e23d88cc5c26a0619a978293ab97c20dc0cea4ddd0784e502055b5873598'))
pulp [4bb148935ef14c1f96c97a89455640b0]: backoff:ERROR: Giving up download_wrapper(...) after 1 tries (aiohttp.client_exceptions.ClientResponseError: 404, message='Not Found', url=URL('https://repo.company.com/pulp/container/docker-cache/library/redis/blobs/sha256:3f60e23d88cc5c26a0619a978293ab97c20dc0cea4ddd0784e502055b5873598'))
pulp [4bb148935ef14c1f96c97a89455640b0]: pulpcore.tasking.tasks:INFO: Task 019133df-9b4f-7309-b537-343a4bf604b8 failed (404, message='Not Found', url=URL('https://repo.company.com/pulp/container/docker-cache/library/redis/blobs/sha256:3f60e23d88cc5c26a0619a978293ab97c20dc0cea4ddd0784e502055b5873598'))
pulp [4bb148935ef14c1f96c97a89455640b0]: pulpcore.tasking.tasks:INFO:   File "/usr/local/lib/python3.9/site-packages/pulpcore/tasking/tasks.py", line 75, in _execute_task
    result = func(*args, **kwargs)

  File "/usr/local/lib/python3.9/site-packages/pulp_container/app/tasks/synchronize.py", line 41, in synchronize
    return dv.create()

  File "/usr/local/lib/python3.9/site-packages/pulpcore/plugin/stages/declarative_version.py", line 161, in create
    loop.run_until_complete(pipeline)

  File "/usr/lib64/python3.9/asyncio/base_events.py", line 647, in run_until_complete
    return future.result()

  File "/usr/local/lib/python3.9/site-packages/pulpcore/plugin/stages/api.py", line 220, in create_pipeline
    await asyncio.gather(*futures)

  File "/usr/local/lib/python3.9/site-packages/pulpcore/plugin/stages/api.py", line 41, in __call__
    await self.run()

  File "/usr/local/lib/python3.9/site-packages/pulpcore/plugin/stages/artifact_stages.py", line 186, in run
    pb.done += task.result()  # download_count

  File "/usr/local/lib/python3.9/site-packages/pulpcore/plugin/stages/artifact_stages.py", line 241, in _handle_content_unit
    await asyncio.gather(*downloaders_for_content)

  File "/usr/local/lib/python3.9/site-packages/pulpcore/plugin/stages/models.py", line 119, in download
    raise e

  File "/usr/local/lib/python3.9/site-packages/pulpcore/plugin/stages/models.py", line 111, in download
    download_result = await downloader.run(extra_data=self.extra_data)

  File "/usr/local/lib/python3.9/site-packages/pulpcore/download/http.py", line 269, in run
    return await download_wrapper()

  File "/usr/local/lib/python3.9/site-packages/backoff/_async.py", line 151, in retry
    ret = await target(*args, **kwargs)

  File "/usr/local/lib/python3.9/site-packages/pulpcore/download/http.py", line 254, in download_wrapper
    return await self._run(extra_data=extra_data)

  File "/usr/local/lib/python3.9/site-packages/pulp_container/app/downloaders.py", line 81, in _run
    response.raise_for_status()

  File "/usr/local/lib64/python3.9/site-packages/aiohttp/client_reqrep.py", line 1070, in raise_for_status
    raise ClientResponseError(

pulp [4bb148935ef14c1f96c97a89455640b0]: pulpcore.tasking.tasks:INFO: Starting task 019133df-9b60-7e4c-b2ea-1a18a3274000
pulp [4bb148935ef14c1f96c97a89455640b0]: pulp_container.app.tasks.synchronize:INFO: Synchronizing: repository=docker-cache/library/alpine remote=docker-cache/library/alpine
Giving up download_wrapper(...) after 1 tries (aiohttp.client_exceptions.ClientResponseError: 401, message='Unauthorized', url=URL('https://repo.company.com/v2/'))
pulp [4bb148935ef14c1f96c97a89455640b0]: backoff:ERROR: Giving up download_wrapper(...) after 1 tries (aiohttp.client_exceptions.ClientResponseError: 401, message='Unauthorized', url=URL('https://repo.company.com/v2/'))
pulp [4bb148935ef14c1f96c97a89455640b0]: pulp_container.app.downloaders:INFO: Updating bearer token
pulp [4bb148935ef14c1f96c97a89455640b0]: pulpcore.tasking.tasks:INFO: Task completed 019133df-9b60-7e4c-b2ea-1a18a3274000
pulp [4bb148935ef14c1f96c97a89455640b0]: pulpcore.tasking.tasks:INFO: Starting task 019133df-9b6f-78c4-908d-d2d38d04b8a2
pulp [4bb148935ef14c1f96c97a89455640b0]: pulp_container.app.tasks.synchronize:INFO: Synchronizing: repository=quay-cache/ansible/awx-ee remote=quay-cache/ansible/awx-ee
Giving up download_wrapper(...) after 1 tries (aiohttp.client_exceptions.ClientResponseError: 401, message='Unauthorized', url=URL('https://repo.company.com/v2/'))
pulp [4bb148935ef14c1f96c97a89455640b0]: backoff:ERROR: Giving up download_wrapper(...) after 1 tries (aiohttp.client_exceptions.ClientResponseError: 401, message='Unauthorized', url=URL('https://repo.company.com/v2/'))
pulp [4bb148935ef14c1f96c97a89455640b0]: pulp_container.app.downloaders:INFO: Updating bearer token
pulp [4bb148935ef14c1f96c97a89455640b0]: pulpcore.tasking.tasks:INFO: Task completed 019133df-9b6f-78c4-908d-d2d38d04b8a2
pulp [4bb148935ef14c1f96c97a89455640b0]: pulpcore.tasking.tasks:INFO: Starting task 019133df-9b7f-7b34-acde-5a398dd4b4bc
pulp [4bb148935ef14c1f96c97a89455640b0]: pulp_container.app.tasks.synchronize:INFO: Synchronizing: repository=docker-cache/library/busybox remote=docker-cache/library/busybox
Giving up download_wrapper(...) after 1 tries (aiohttp.client_exceptions.ClientResponseError: 401, message='Unauthorized', url=URL('https://repo.company.com/v2/'))
pulp [4bb148935ef14c1f96c97a89455640b0]: backoff:ERROR: Giving up download_wrapper(...) after 1 tries (aiohttp.client_exceptions.ClientResponseError: 401, message='Unauthorized', url=URL('https://repo.company.com/v2/'))
pulp [4bb148935ef14c1f96c97a89455640b0]: pulp_container.app.downloaders:INFO: Updating bearer token
Backing off download_wrapper(...) for 0.2s (aiohttp.client_exceptions.ClientResponseError: 500, message='Internal Server Error', url=URL('https://repo.company.com/pulp/container/docker-cache/library/busybox/manifests/latest'))
pulp [4bb148935ef14c1f96c97a89455640b0]: backoff:INFO: Backing off download_wrapper(...) for 0.2s (aiohttp.client_exceptions.ClientResponseError: 500, message='Internal Server Error', url=URL('https://repo.company.com/pulp/container/docker-cache/library/busybox/manifests/latest'))
pulp [4bb148935ef14c1f96c97a89455640b0]: pulpcore.tasking.tasks:INFO: Task completed 019133df-9b7f-7b34-acde-5a398dd4b4bc

To Reproduce Try to replicate a pull through cache

Expected behavior Replication should be successful

Additional context

lubosmj commented 2 months ago

Can you also share the version of pulp-container, please?

lubosmj commented 2 months ago

May we also see the traceback for aiohttp.client_exceptions.ClientResponseError: 500? It should be logged on your upstream server.

PotentialIngenuity commented 2 months ago

pulp-container is version: 2.21.0

Here is a traceback for the 500

[2024-08-08 21:22:36 +0000] [27] [ERROR] Error handling request
Traceback (most recent call last):
  File "/usr/local/lib/python3.9/site-packages/pulp_container/app/downloaders.py", line 81, in _run
    response.raise_for_status()
  File "/usr/local/lib64/python3.9/site-packages/aiohttp/client_reqrep.py", line 1070, in raise_for_status
    raise ClientResponseError(
aiohttp.client_exceptions.ClientResponseError: 401, message='Unauthorized', url=URL('https://registry-1.docker.io/v2/library/redis/blobs/sha256:fd1a2ab224be08d16bd6af65b06e11bbb13159d8ad4d9e80ceb5688adb50ce47')

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/usr/local/lib64/python3.9/site-packages/aiohttp/web_protocol.py", line 452, in _handle_request
    resp = await request_handler(request)
  File "/usr/local/lib64/python3.9/site-packages/aiohttp/web_app.py", line 543, in _handle
    resp = await handler(request)
  File "/usr/local/lib64/python3.9/site-packages/aiohttp/web_middlewares.py", line 114, in impl
    return await handler(request)
  File "/usr/local/lib/python3.9/site-packages/pulpcore/content/authentication.py", line 48, in authenticate
    return await handler(request)
  File "/usr/local/lib/python3.9/site-packages/pulpcore/content/instrumentation.py", line 230, in middleware
    resp = await handler(request)
  File "/usr/local/lib/python3.9/site-packages/pulpcore/cache/cache.py", line 352, in cached_function
    response = await self.make_entry(
  File "/usr/local/lib/python3.9/site-packages/pulpcore/cache/cache.py", line 395, in make_entry
    response = await handler(*args, **kwargs)
  File "/usr/local/lib/python3.9/site-packages/pulp_container/app/registry.py", line 342, in get_by_digest
    return await self._stream_content_artifact(request, web.StreamResponse(), ca)
  File "/usr/local/lib/python3.9/site-packages/pulpcore/content/handler.py", line 820, in _stream_content_artifact
    response = await self._stream_remote_artifact(request, response, remote_artifact)
  File "/usr/local/lib/python3.9/site-packages/pulpcore/content/handler.py", line 1112, in _stream_remote_artifact
    download_result = await downloader.run()
  File "/usr/local/lib/python3.9/site-packages/pulpcore/download/http.py", line 269, in run
    return await download_wrapper()
  File "/usr/local/lib/python3.9/site-packages/backoff/_async.py", line 151, in retry
    ret = await target(*args, **kwargs)
  File "/usr/local/lib/python3.9/site-packages/pulpcore/download/http.py", line 254, in download_wrapper
    return await self._run(extra_data=extra_data)
  File "/usr/local/lib/python3.9/site-packages/pulp_container/app/downloaders.py", line 94, in _run
    await self.update_token(response_auth_header, this_token, repo_name)
  File "/usr/lib64/python3.9/asyncio/locks.py", line 120, in acquire
    await fut
RuntimeError: Task <Task pending name='Task-10990' coro=<RequestHandler._handle_request() running at /usr/local/lib64/python3.9/site-packages/aiohttp/web_protocol.py:452> cb=[<TaskWakeupMethWrapper object at 0x7fd2657047f0>()]> got Future <Future pending> attached to a different loop
::ffff:172.18.0.11 [08/Aug/2024:21:22:33 +0000] "GET /pulp/container/docker-cache/library/redis/blobs/sha256:fd1a2ab224be08d16bd6af65b06e11bbb13159d8ad4d9e80ceb5688adb50ce47 HTTP/1.0" 500 225 "-" "pulpcore/3.57.0 (cpython 3.9.19-final0, Linux x86_64) (aiohttp 3.9.5)"
pulp [None]: pulp_container.app.downloaders:INFO: Updating bearer token
[2024-08-08 21:22:39 +0000] [27] [ERROR] Error handling request
Traceback (most recent call last):
  File "/usr/local/lib64/python3.9/site-packages/aiohttp/web_protocol.py", line 452, in _handle_request
    resp = await request_handler(request)
  File "/usr/local/lib64/python3.9/site-packages/aiohttp/web_app.py", line 543, in _handle
    resp = await handler(request)
  File "/usr/local/lib64/python3.9/site-packages/aiohttp/web_middlewares.py", line 114, in impl
    return await handler(request)
  File "/usr/local/lib/python3.9/site-packages/pulpcore/content/authentication.py", line 48, in authenticate
    return await handler(request)
  File "/usr/local/lib/python3.9/site-packages/pulpcore/content/instrumentation.py", line 230, in middleware
    resp = await handler(request)
  File "/usr/local/lib/python3.9/site-packages/pulpcore/cache/cache.py", line 352, in cached_function
    response = await self.make_entry(
  File "/usr/local/lib/python3.9/site-packages/pulpcore/cache/cache.py", line 395, in make_entry
    response = await handler(*args, **kwargs)
  File "/usr/local/lib/python3.9/site-packages/pulp_container/app/registry.py", line 342, in get_by_digest
    return await self._stream_content_artifact(request, web.StreamResponse(), ca)
  File "/usr/local/lib/python3.9/site-packages/pulpcore/content/handler.py", line 820, in _stream_content_artifact
    response = await self._stream_remote_artifact(request, response, remote_artifact)
  File "/usr/local/lib/python3.9/site-packages/pulpcore/content/handler.py", line 1112, in _stream_remote_artifact
    download_result = await downloader.run()
  File "/usr/local/lib/python3.9/site-packages/pulpcore/download/http.py", line 269, in run
    return await download_wrapper()
  File "/usr/local/lib/python3.9/site-packages/backoff/_async.py", line 151, in retry
    ret = await target(*args, **kwargs)
  File "/usr/local/lib/python3.9/site-packages/pulpcore/download/http.py", line 254, in download_wrapper
    return await self._run(extra_data=extra_data)
  File "/usr/local/lib/python3.9/site-packages/pulp_container/app/downloaders.py", line 107, in _run
    to_return = await self._handle_response(response)
  File "/usr/local/lib/python3.9/site-packages/pulpcore/download/http.py", line 201, in _handle_response
    await self.headers_ready_callback(response.headers)
  File "/usr/local/lib/python3.9/site-packages/pulpcore/content/handler.py", line 1072, in handle_response_headers
    await response.prepare(request)
  File "/usr/local/lib64/python3.9/site-packages/aiohttp/web_response.py", line 417, in prepare
    return await self._start(request)
  File "/usr/local/lib64/python3.9/site-packages/aiohttp/web_response.py", line 425, in _start
    await self._write_headers()
  File "/usr/local/lib64/python3.9/site-packages/aiohttp/web_response.py", line 505, in _write_headers
    await writer.write_headers(status_line, self._headers)
  File "/usr/local/lib64/python3.9/site-packages/aiohttp/http_writer.py", line 130, in write_headers
    self._write(buf)
  File "/usr/local/lib64/python3.9/site-packages/aiohttp/http_writer.py", line 75, in _write
    raise ConnectionResetError("Cannot write to closing transport")
ConnectionResetError: Cannot write to closing transport
lubosmj commented 2 months ago

The 500 error has been reported by multiple users. We are going to work on it. Seems to be unrelated to pulp-container.

May I ask you to share the parameters you had used when creating the replica? In 2.21, we introduced the replication feature to pulp-container. The issue might be related to the parameters used.

PotentialIngenuity commented 2 months ago

These are my replica settings

{
    "api_root": "/pulp/",
    "base_url": "https://repo.company.com",
    "ca_cert": null,
    "client_cert": null,
    "domain": null,
    "hidden_fields": [
        {
            "is_set": false,
            "name": "client_key"
        },
        {
            "is_set": true,
            "name": "username"
        },
        {
            "is_set": true,
            "name": "password"
        }
    ],
    "name": "repo",
    "pulp_created": "2024-04-26T22:10:40.934799Z",
    "pulp_href": "/pulp/api/v3/upstream-pulps/018f1c75-eaa6-7734-b7de-f7afe48dff66/",
    "pulp_label_select": "",
    "pulp_last_updated": "2024-04-26T22:10:40.934826Z",
    "tls_validation": false
}
lubosmj commented 2 months ago

Are the distributions in question marked as private in upstream? Can you verify you can access them and pull images by using the same credentials as configured on the replica?

lubosmj commented 2 months ago

Here are some findings with regard to the HTTP 500 error. If the timeout on the gunicorn worker is small, it can close the connection when serving larger files through a CDN. Can you try setting the timeout to a larger value, like 600, to see if it resolves the HTTP 500 error?

PotentialIngenuity commented 2 months ago

After I increased the timeout I noticed several of the containers would show:

pulp [a0b73760ce164b47bc3cb0b7a696f924]: pulpcore.tasking.tasks:INFO: Task 019151cc-cca1-7cef-b729-d1997573ef0e failed (404, message='Not Found', url=URL('https://repo.company.com/pulp/container/docker-cache/zabbix/zabbix-agent/blobs/sha256:9c0412900a89e222d2f8dc0afd2850a2e390ab6ee1d42b5ef4c3987bb65c55b2'))

I wondered if the data had expired out so I pulled them through the cache again and then the replication worked again.

lubosmj commented 2 months ago

Is everything working properly now? What was the root cause of the issue? Content in pulp-container should not expire since it is stored in a repository. This is different compared to pulp-python.

When it comes to pushing/syncing larger images, we encourage users to reasonably increase the token expiration time or fine-tune timeouts for workers.

ipanova commented 2 months ago

When it comes to pushing/syncing larger images, we encourage users to reasonably increase the token expiration time or fine-tune timeouts for workers.

Just for pushing, when syncing we cannot influence expiration time of the external token auth service.

PotentialIngenuity commented 1 month ago

I was able to get the sync to be successful. Pulling the images through the cache again was my solution to the 404s on the container blobs. The 500s resolved themselves. I also had one private container distribution that was able to sync after i set it to public. This may be another bug. I was able to manually get a token and authenticate to the private distribution.