jxlil / scrapy-impersonate

Scrapy download handler that can impersonate browser' TLS signatures or JA3 fingerprints.
MIT License
78 stars 9 forks source link

Support proxy authentication with dictionary #13

Open seldcat opened 1 month ago

seldcat commented 1 month ago

Hello!

I hope you're doing well. I have a feature request that I believe would enhance the usability of your project: supporting proxy settings as a dictionary. Currently, when processing a proxy, the parser mistakenly encodes the dictionary, causing the request to the proxy server to fail with error 5.

Proposed Solution To address this issue, I suggest the following:

Detection: When decoding the authentication proxy from the headers, check if the line contains a dictionary. Encoding: If a dictionary is detected, extract these settings and encode them correctly: urllib.parse.quote(json.dumps(settings, separators=(',', ':'))) Replacement: Replace the dictionary in the decoded string with the encoded settings.

By implementing these steps, the proxy settings should be processed correctly, allowing the request to be sent successfully.

Thank you for considering this enhancement. It would greatly improve the functionality and flexibility of handling proxy settings.

jxlil commented 1 month ago

Hi @seldcat, thanks for your comment.

Could you provide an example of how you are making the request, so I can test?

seldcat commented 1 month ago

If we consider a typical proxy format:

<scheme>://<user>:<pass>@<host>:<port>

In my case, the pass field is a dictionary with keys like token, country, etc. When using such proxies, the current implementation fails because it doesn’t encode the dictionary properly.

To resolve this, you need to encode the dictionary as I described earlier.

seldcat commented 1 month ago

It's worth noting that Scrapy decodes the username and password when collecting the header (see this line in the Scrapy source code). However, scrapy-impersonate does not encode it back. Therefore, if the username or password contains special characters, it will cause a crash.

Also, everything works fine for me if I disable the scrapy.downloadermiddlewares.httpproxy.HttpProxyMiddleware (the password and login are decoded here and then not properly encoded, sending the improperly encoded password to the next stage), so perhaps the problem is on my side.

jxlil commented 3 days ago

Sorry, I've been a little busy, I'm going to get back to this today.