Open duncanlutz opened 2 months ago
+1, our test suite has also picked this up.
Might be due to https://github.com/lexiforest/curl_cffi/commit/9c13b830f378687900ddbb953ae8edb9998b3b1d
As a side note, I don't think you should be url-decoding and then re-encoding the query component, as it may not produce the same result. https://datatracker.ietf.org/doc/html/rfc3986#section-2.4
Here's what urllib3 does, for example.: https://github.com/urllib3/urllib3/blob/main/src/urllib3/util/url.py#L227
Thanks, my bad. I should really think this through.
I can confirm this causes issues on real life sites, very annoying to debug too, almost started pulling my hair out before I found this issue.
@lexiforest.
Since curl_cffi aims to be API-compatible with the requests library, may I suggest using requests' requote_uri()? It is the standard way the library deals with URL-encoded strings.
assert requote_uri("https://duncanlutz.dev/example/%2f%2f%2f") == "https://duncanlutz.dev/example/%2f%2f%2f"
assert requote_uri("https://duncanlutz.dev/e x a m p l e") == "https://duncanlutz.dev/e%20x%20a%20m%20p%20l%20e"
It covers #333 while also fixing this current issue.
Hi, same is experienced where encoding is done where it should not be (due to this change), breaking some sites:
from curl_cffi.requests import request
url = 'https://example.com/imaginary-pagination:7'
print(url)
print(request("GET", url).request.url)
https://example.com/imaginary-pagination:7
https://example.com/imaginary-pagination%3A7
Would be great to have an option to control encoding of URL for request
Hi, folks. Please checkout #405 and let me know if it fixes you problems.
About the urllib3 and requests solution, I did experiment with them. However, I feel like that we should give users more control over whether some letters, like the :
, should be quoted or not.
Should be fixed in v0.7.3.
This is still an issue on 0.7.3 (particularly +
and =
). You should just stop modifying URLs! It's always going to lead to trouble, and having to manually test and change quote values for every request is not viable.
This is still an issue (particularly
+
and=
). You should just stop modifying URLs! It's always going to lead to trouble, and having to manually test and change quote values for every request is not viable.
Hi, could you please add a few examples? Some characters DO need to be quoted, like spaces, otherwise libcurl will throw an error. As for +
and =
, I guess they are being mistakenly unquoted from %3D
to =
, right?
Yes, it is a URL being unquoted.
Here is an example URL:
https://example.com/path?token=example%7C2024-10-20T10%3A00%3A00Z%7ZYJkEtJQoGNQ3lyQRSnYbWLXUCUNVPQrBDW3VDEBWd1CIrShUzWBQTvzwXEtLZwy8uAxIM%2B3ke%2BQW%2F%2FkyJzGGogANuv5rw%2FXXp%2B5hZz2RW28%3D%7C8bd02e990e29ec76b54cec894e1470b4157fc1ed
https://example.com/path?token=example%7C2024-10-20T10:00:40Z%7ZYJkEtJQoGNQ3lyQRSnYbWLXUCUNVPQrBDW3VDEBWd1CIrShUzWBQTvzwXEtLZwy8uAxIM+3ke+QW//kyJzGGogANuv5rw/XXp+5hZz2RW28=%7C8bd02e990e29ec76b54cec894e1470b4157fc1ed
I see, this is not what I would expect, too. Sorry for the mess, it will be fixed in the next minor version.
Same happens with encoded commas as well, probably all encoded characters.
Describe the bug We have an endpoint which utilizes IDs that contain URL-encoded special characters. After the update to curl_cffi 0.7.2, these requests began failing. After some investigation we found the package had started url-decoding the url before sending the request. In our case, the special characters are
%2f
or/
, which causes the URL to be malformed and the request to fail.To Reproduce
I've also put a repo up with the example: https://github.com/duncanlutz/curl_cffi_issue
Expected behavior In previous versions, the url had not been decoded before making the request. Our expected behavior would be to either not decode the URL, or provide a way to opt out of decoding.
Versions
pip freeze
dump:Additional context