release-engineering / pubtools-pulplib

A Pulp library for publishing tools
GNU General Public License v3.0
2 stars 24 forks source link

Reduce memory usage when uploading large files [RHELDST-15834] #197

Closed rohanpm closed 1 year ago

rohanpm commented 1 year ago

When performing requests with the python-requests library, each returned response holds a reference back to the original request.

In the case of uploading files, this includes a reference to the request body, which will be 10MB when using the default configuration (of uploading files in 10MB chunks).

This increases memory usage significantly because these objects exist within a reference cycle and thus rely on GC to be freed, but that doesn't necessarily happen promptly. Thus, although the upload loop was designed to have a max expected memory usage (number of concurrent chunks * size of chunks), in fact every chunk had to remain in memory for an indefinite period even after each request completed, leading to an actual memory usage much higher than expected.

Breaking the unused link from response back to request allows the request (and the request body) to be cleaned up immediately as the reference count drops to zero, without relying on GC. This brings the memory usage back in line with the intended design.

This was tested using the "examples/upload-files" script in this repo. Prior to this change, when uploading a ~370MB file the RSS of that process would increase over time from ~100MB to ~270MB. With this change, the RSS remained stable at ~100MB for the entire duration of the upload.

codecov[bot] commented 1 year ago

Codecov Report

Base: 100.00% // Head: 100.00% // No change to project coverage :thumbsup:

Coverage data is based on head (5356e7e) compared to base (04fe0fc). Patch coverage: 100.00% of modified lines in pull request are covered.

Additional details and impacted files ```diff @@ Coverage Diff @@ ## master #197 +/- ## ========================================= Coverage 100.00% 100.00% ========================================= Files 46 46 Lines 2948 2950 +2 ========================================= + Hits 2948 2950 +2 ``` | [Impacted Files](https://codecov.io/gh/release-engineering/pubtools-pulplib/pull/197?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=release-engineering) | Coverage Δ | | |---|---|---| | [pubtools/pulplib/\_impl/client/client.py](https://codecov.io/gh/release-engineering/pubtools-pulplib/pull/197?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=release-engineering#diff-cHVidG9vbHMvcHVscGxpYi9faW1wbC9jbGllbnQvY2xpZW50LnB5) | `100.00% <100.00%> (ø)` | | Help us with your feedback. Take ten seconds to tell us [how you rate us](https://about.codecov.io/nps?utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=release-engineering). Have a feature suggestion? [Share it here.](https://app.codecov.io/gh/feedback/?utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=release-engineering)

:umbrella: View full report at Codecov.
:loudspeaker: Do you have feedback about the report comment? Let us know in this issue.