pulp / pulpcore

Pulp 3 pulpcore package https://pypi.org/project/pulpcore/
GNU General Public License v2.0
298 stars 115 forks source link

Replication - support limiting number of concurrent syncs #4271

Open PotentialIngenuity opened 1 year ago

PotentialIngenuity commented 1 year ago

The UpstreamPulp config should have an additional field called 'download_concurrency'. It should be used to set the download_concurrency of the remotes.

Version core - 3.30.0

Describe the bug

This error appears for several of the sync jobs out of 100s that occur.

pulp [70d9671d002e4e8fba226e4dc1d248eb]: pulp_rpm.app.tasks.synchronizing:INFO: Synchronizing: repository=rhel8/8/x86_64/codeready-builder/os/2023/31 remote=rhel8/8/x86_64/codeready-builder/os/2023/31
Backing off download_wrapper(...) for 0.9s (aiohttp.client_exceptions.ClientPayloadError: Response payload is not completed)
pulp [70d9671d002e4e8fba226e4dc1d248eb]: backoff:INFO: Backing off download_wrapper(...) for 0.9s (aiohttp.client_exceptions.ClientPayloadError: Response payload is not completed)
Backing off download_wrapper(...) for 0.3s (aiohttp.client_exceptions.ClientPayloadError: Response payload is not completed)
pulp [70d9671d002e4e8fba226e4dc1d248eb]: backoff:INFO: Backing off download_wrapper(...) for 0.3s (aiohttp.client_exceptions.ClientPayloadError: Response payload is not completed)
Backing off download_wrapper(...) for 2.4s (aiohttp.client_exceptions.ClientPayloadError: Response payload is not completed)
pulp [70d9671d002e4e8fba226e4dc1d248eb]: backoff:INFO: Backing off download_wrapper(...) for 2.4s (aiohttp.client_exceptions.ClientPayloadError: Response payload is not completed)
Backing off download_wrapper(...) for 6.2s (aiohttp.client_exceptions.ClientPayloadError: Response payload is not completed)
pulp [70d9671d002e4e8fba226e4dc1d248eb]: backoff:INFO: Backing off download_wrapper(...) for 6.2s (aiohttp.client_exceptions.ClientPayloadError: Response payload is not completed)
Giving up download_wrapper(...) after 5 tries (aiohttp.client_exceptions.ClientPayloadError: Response payload is not completed)
pulp [70d9671d002e4e8fba226e4dc1d248eb]: backoff:ERROR: Giving up download_wrapper(...) after 5 tries (aiohttp.client_exceptions.ClientPayloadError: Response payload is not completed)
pulp [70d9671d002e4e8fba226e4dc1d248eb]: pulpcore.tasking.tasks:INFO: Task 0189e4df-bb9a-7cee-9f9a-efc564971651 failed (Response payload is not completed)
pulp [70d9671d002e4e8fba226e4dc1d248eb]: pulpcore.tasking.tasks:INFO:   File "/usr/local/lib/python3.8/site-packages/pulpcore/tasking/tasks.py", line 65, in _execute_task
    result = func(*args, **kwargs)

  File "/usr/local/lib/python3.8/site-packages/pulp_rpm/app/tasks/synchronizing.py", line 569, in synchronize
    repo_version = dv.create() or repo.latest_version()

  File "/usr/local/lib/python3.8/site-packages/pulpcore/plugin/stages/declarative_version.py", line 161, in create
    loop.run_until_complete(pipeline)

  File "/usr/lib64/python3.8/asyncio/base_events.py", line 616, in run_until_complete
    return future.result()

  File "/usr/local/lib/python3.8/site-packages/pulpcore/plugin/stages/api.py", line 220, in create_pipeline
    await asyncio.gather(*futures)

  File "/usr/local/lib/python3.8/site-packages/pulpcore/plugin/stages/api.py", line 41, in __call__
    await self.run()

  File "/usr/local/lib/python3.8/site-packages/pulpcore/plugin/stages/artifact_stages.py", line 185, in run
    pb.done += task.result()  # download_count

  File "/usr/local/lib/python3.8/site-packages/pulpcore/plugin/stages/artifact_stages.py", line 240, in _handle_content_unit
    await asyncio.gather(*downloaders_for_content)

  File "/usr/local/lib/python3.8/site-packages/pulpcore/plugin/stages/models.py", line 119, in download
    raise e

  File "/usr/local/lib/python3.8/site-packages/pulpcore/plugin/stages/models.py", line 111, in download
    download_result = await downloader.run(extra_data=self.extra_data)

  File "/usr/local/lib/python3.8/site-packages/pulpcore/download/http.py", line 273, in run
    return await download_wrapper()

  File "/usr/local/lib/python3.8/site-packages/backoff/_async.py", line 151, in retry
    ret = await target(*args, **kwargs)

  File "/usr/local/lib/python3.8/site-packages/pulpcore/download/http.py", line 258, in download_wrapper
    return await self._run(extra_data=extra_data)

  File "/usr/local/lib/python3.8/site-packages/pulp_rpm/app/downloaders.py", line 118, in _run
    to_return = await self._handle_response(response)

  File "/usr/local/lib/python3.8/site-packages/pulpcore/download/http.py", line 207, in _handle_response
    chunk = await response.content.read(1048576)  # 1 megabyte

  File "/usr/local/lib64/python3.8/site-packages/aiohttp/streams.py", line 385, in read
    await self._wait("read")

  File "/usr/local/lib64/python3.8/site-packages/aiohttp/streams.py", line 304, in _wait
    await waiter

To Reproduce

Expected behavior The syncs should be successful.

Additional context

dralley commented 1 year ago

So generally when users see this it has been because they're hitting the Red Hat CDN often enough that the Akamai DDoS protection starts kicking in and closing connections. Could you try reducing the download concurrency values on the sync jobs?

dkliban commented 1 year ago

The problem is that the replication task doesn't currenlty provide any way to set the download_concurrency on the remotes that it creates. The default of 10 is being used everywhere.

For now you can just adjust the download_concurrency on all the remotes on the replica Pulp. The replication task will not adjust that value.

dralley commented 1 year ago

The default of 10 is being used everywhere.

Ah, that is probably it, then. In the RPM plugin we use a lower default (7 I think?) specifically to avoid triggering the CDN activity throttling.

dralley commented 1 year ago

Wait, are the replicas pointing at Pulp? If that's the case then it probably wouldn't be the CDN causing those errors to be thrown? Unless the replicated Pulp is on-demand and downloading from the CDN itself as a proxy?

PotentialIngenuity commented 1 year ago

that is right. it is between two pulp instances with the policy set at immediate. i was able to work around this error by reducing the download_currency value.

dkliban commented 5 months ago

After some discussion at the pulpcore meeting we decided it would be better to limit the number of concurrent syncs.