Closed jelmer closed 2 years ago
I looked into this issue and it seems the reason data cannot be read when trying to discover references in the repository above is because the Accept-Encoding
HTTP header is set to gzip
when sending request to the /info/refs
endpoint.
The following diff fixes the observed issue, all tests are still passing but not sure about the side effect of that change though.
diff --git a/dulwich/client.py b/dulwich/client.py
index 33742b0d..0566ec97 100644
--- a/dulwich/client.py
+++ b/dulwich/client.py
@@ -1932,7 +1932,7 @@ class AbstractHttpGitClient(GitClient):
if self.dumb is not True:
tail += "?service=%s" % service.decode("ascii")
url = urljoin(base_url, tail)
- resp, read = self._http_request(url, headers, allow_compression=True)
+ resp, read = self._http_request(url, headers)
if resp.redirect_location:
# Something changed (redirect!), so let's update the base URL
@@ -2236,7 +2236,7 @@ class Urllib3HttpGitClient(AbstractHttpGitClient):
resp.redirect_location = resp_url if resp_url != url else ""
# TODO(jelmer): Remove BytesIO() call that caches entire response in
# memory. See https://github.com/jelmer/dulwich/issues/966
- return resp, BytesIO(resp.data).read
+ return resp, resp.read
HttpGitClient = Urllib3HttpGitClient
Hope that it will hep.
Thanks, that's definitely helpful.
dulwich currently caches the full HTTP response in memory when talking to remote servers. This obviously has both memory and CPU consequences. I attempted to revert this in 97814a9bb5294c4309c83603f261905d99dfc240, but that broke access to some HTTP servers.
Example URL that broke: https://forge.softwareheritage.org/source/swh-loader-core
I've reintroduced the change for now, but we should figure out why the BytesIO() call is necessary today and attempt to remove it again once we find out.