lwthiker / curl-impersonate

curl-impersonate: A special build of curl that can impersonate Chrome & Firefox
MIT License
3.45k stars 229 forks source link

Response Headers incorrectly reporting content-encoding as gzip #173

Closed erin-stuelke closed 10 months ago

erin-stuelke commented 10 months ago

I recently upgraded from one of the earliest versions of curl_impersonate (0.3.1) to the latest and have noticed that the response headers are reporting that the content-encoding is gzip when it is not.

python 3.8.10
pycurl 7.44.1
ubuntu 20.04
libcurl-impersonate-v0.5.4.x86_64-linux-gnu.tar.gz
CURL_IMPERSONATE=chrome110
LD_PRELOAD=/usr/local/bin/curl/libcurl-impersonate-chrome.so

I thought maybe it had to do with the upstream version of curl so I tried v0.5.1 (latest version that used the same version of curl), but I still got the error. When I reverted back to the version I had been using, it works fine.

I can still read the buffer, however it is important that I get a compressed response since I use proxies which can be pretty expensive.

Any thoughts?

erin-stuelke commented 10 months ago

Figured out the problem! We had not been setting pycurl.ACCEPT_ENCODING option and handling the decoding ourselves. The newer version sets this automatically in the libcurl patch which was causing our decoding workflow to fail. I double-checked by unsetting the pycurl.ACCEPT_ENCODING option explicitly and then tried to decode without issue. Now that we know it's being set for us automatically, we can remove our decoding workflow.

Thanks for an awesome library!