psf / requests

A simple, yet elegant, HTTP library.
https://requests.readthedocs.io/en/latest/
Apache License 2.0
52.19k stars 9.33k forks source link

Requests changes the URL on GET #6794

Closed Peque closed 2 months ago

Peque commented 2 months ago

When fetching a URL containing %2c with GET, the URL gets modified and %2c is replaced by %2C.

The problem with this is that the server may return a 404 when using %2C even if the resource exists when looking for it using %2c (i.e.: the server is expecting you to honor the lower-case encoding). This should not be the case since URL percent-encoding is case insensitive, but actual servers may not comply with that and require you to use lower case.

This comes from a real use case where the server is generating and providing a URL to download a file, and refuses to reply with the file unless %2c is used (instead of %2C).

Expected Result

Requests should not modify the URL, since %2c is already encoded (and lower-case is just as valid encoding as upper-case).

This expected behavior can be found when using other libraries like httpx.

Actual Result

Requests modifies the URL, changing %2c to %2C.

Reproduction Steps

url = "https://domain.com/foo/bar%2cbaz.pdf"
response = requests.get(url)

print(url)           # https://domain.com/foo/bar%2cbaz.pdf
print(response.url)  # https://domain.com/foo/bar%2Cbaz.pdf

System Information

$ python -m requests.help
{
  "chardet": {
    "version": "5.2.0"
  },
  "charset_normalizer": {
    "version": "3.3.2"
  },
  "cryptography": {
    "version": ""
  },
  "idna": {
    "version": "3.4"
  },
  "implementation": {
    "name": "CPython",
    "version": "3.11.5"
  },
  "platform": {
    "release": "6.10.6-200.fc40.x86_64",
    "system": "Linux"
  },
  "pyOpenSSL": {
    "openssl_version": "",
    "version": null
  },
  "requests": {
    "version": "2.32.3"
  },
  "system_ssl": {
    "version": "300000c0"
  },
  "urllib3": {
    "version": "2.0.7"
  },
  "using_charset_normalizer": false,
  "using_pyopenssl": false
}
sigmavirus24 commented 2 months ago

In the future, please search closed and open issues before creating new ones that are duplicates.

Peque commented 2 months ago

@sigmavirus24 I always do, but I didn't manage to find a duplicate.

Would you mind sharing which is/are the duplicate/s of this issue?

Also for traceability and future users that may come here and want to understand why this one was closed.

sigmavirus24 commented 2 months ago

6115

Peque commented 2 months ago

@sigmavirus24 I think that would not be the same issue as stated here. The issue I describe is about a URL that is already escaped/encoded using %2c. So there is no reason for requests to re-encode it again with %2C (both %2c and %2C are valid ways to encode/escape).

Or am I missing something?

sigmavirus24 commented 2 months ago

What you're missing is that in that issue and others linked from it, people wish to control the encoding (or lack thereof) and in general normalization of the query string, which is what you want as well