Closed rohanpm closed 1 year ago
Base: 100.00% // Head: 100.00% // No change to project coverage :thumbsup:
Coverage data is based on head (
5356e7e
) compared to base (04fe0fc
). Patch coverage: 100.00% of modified lines in pull request are covered.
:umbrella: View full report at Codecov.
:loudspeaker: Do you have feedback about the report comment? Let us know in this issue.
When performing requests with the python-requests library, each returned response holds a reference back to the original request.
In the case of uploading files, this includes a reference to the request body, which will be 10MB when using the default configuration (of uploading files in 10MB chunks).
This increases memory usage significantly because these objects exist within a reference cycle and thus rely on GC to be freed, but that doesn't necessarily happen promptly. Thus, although the upload loop was designed to have a max expected memory usage (number of concurrent chunks * size of chunks), in fact every chunk had to remain in memory for an indefinite period even after each request completed, leading to an actual memory usage much higher than expected.
Breaking the unused link from response back to request allows the request (and the request body) to be cleaned up immediately as the reference count drops to zero, without relying on GC. This brings the memory usage back in line with the intended design.
This was tested using the "examples/upload-files" script in this repo. Prior to this change, when uploading a ~370MB file the RSS of that process would increase over time from ~100MB to ~270MB. With this change, the RSS remained stable at ~100MB for the entire duration of the upload.