psf / requests

A simple, yet elegant, HTTP library.
https://requests.readthedocs.io/en/latest/
Apache License 2.0
52.04k stars 9.31k forks source link

breaking aws s3 usage with requests 2.32.0 #6711

Open kristianelliott80 opened 4 months ago

kristianelliott80 commented 4 months ago

requests 2.32.0 introduced a change https://github.com/psf/requests/pull/6644 that strips double /.

This has introduced an issue where generated presigned urls for s3 keys that start with a / can no longer be used. requests now strips that second / meaning that the key is modified and the signature does not match resulting in 403 errors. We can adjust to remove the leading / in our keys but this may be affecting other users or use cases

Expected Result

To be able to use presigned_urls for s3 keys with leading "/"

Actual Result

URL that was passed was modified resulting in a 403 error response from aws

Reproduction Steps

import requests
import boto3
s3 = boto3.client('s3')
bucket = "bucket"
key = "/key_with_leading_slash.txt"
presigned_url = s3.generate_presigned_url("get_object", Params={'Bucket': bucket, 'Key': key})
requests.get(presigned_url)

System Information

$ python -m requests.help
{
  "chardet": {
    "version": "5.1.0"
  },
  "charset_normalizer": {
    "version": "3.1.0"
  },
  "cryptography": {
    "version": ""
  },
  "idna": {
    "version": "3.4"
  },
  "implementation": {
    "name": "CPython",
    "version": "3.9.6"
  },
  "platform": {
    "release": "10",
    "system": "Windows"
  },
  "pyOpenSSL": {
    "openssl_version": "",
    "version": null
  },
  "requests": {
    "version": "2.32.1"
  },
  "system_ssl": {
    "version": "101010bf"
  },
  "urllib3": {
    "version": "1.26.15"
  },
  "using_charset_normalizer": false,
  "using_pyopenssl": false
}
crabhi commented 3 months ago

I can second this. It's not only AWS. We just hit this problem with a completely unrelated web service that also behaves differently if the double leading slash is normalized to one.

sigmavirus24 commented 3 months ago

I believe the s3 use case is being addressed by AWS in their SDK.

As for a service that expects a URI whose path is not normalized and returns different behavior, that's not compliant with any RFC or security best practices. I'm not losing sleep over that not working

crabhi commented 3 months ago

Thanks for your response! Why do you think double slash is not compliant with any RFC? I agree it's unusual but that service is not under our control.

Reading RFC3986, section 3.3, it seems to me the case of double leading slash is legal. It doesn't contradict anything in the text and it conforms to the ABNF below: using path-abempty (because authority is present, also used in ABNF at the beginning of Sec. 3). The first segment would be zero-length which is allowed in path-abempty as opposed to path-absolute.

crabhi commented 3 months ago

Another case in point - curl handles leading double slash just fine. Source: my testing with mitmproxy.

sigmavirus24 commented 3 months ago

It is absolutely a valid URI. It's also semantically equivalent to the normalized version and must be treated that way. GET //example is equivalent to GET /example

crabhi commented 3 months ago

What defines the normalization? In that RFC, I can find only about "dot-segment removal". This is a genuine question - I'm but a user. I have never had to implement URL parsing, so there may be another RFC I don't know about that defines this normalization.

The path //example is equivalent to /example in a filesystem. But is it necessarily true for an URI/URL?

SpoonMeiser commented 3 months ago

I believe the s3 use case is being addressed by AWS in their SDK.

@sigmavirus24 where have you seen this? Is there an issue or something you can point us towards?

sigmavirus24 commented 3 months ago

I don't follow the issues there but I had a conversation with one of the python SDK maintainers. I won't tag them here though so that they don't get harangued as this community is wont to do

SpoonMeiser commented 3 months ago

I couldn't find an issue for it, so I created one: https://github.com/boto/boto3/issues/4181

@sigmavirus24 if you have another conversation with your maintainer friend, you could point him at this issue, and those of us that don't care whether the requests behaviour is technically right or not and only want it to work with S3, can watch to see when that issue gets resolved.

crabhi commented 3 months ago

For what it's worth, I don't care that much about RFC legalese. The problem is I also see a web service (completely unrelated to S3, not even a storage) that doesn't work correctly when queried from requests.

If we can agree the previous behaviour was a bug WRT the colon handling in the first path segment but the current fix introduced another bug, I can try to contribute a patch to urllib3 to fix the colon issue there, so that #6644 can be reverted.

sigmavirus24 commented 3 months ago

I don't agree that stripping the redundant and superfluous slash is a bug. I also think that urllib3 can be improved in general around this but doing so in a backwards compatible fashion that doesn't recreate this bug but in a lower part of the stack isn't going to improve things

crabhi commented 3 months ago

I tried a few other clients - Python stdlib, Curl, Go stdlib, Elixir Req, Groovy stdlib (probably same as Java) - and I couldn't find another one that strips the slashes. Also, I've tested a few servers (Python http.server, Nginx and Apache) and neither of them "normalizes" the slashes.

https://gist.github.com/crabhi/080d746e3eb4fc53380bc8291cdd0f7d

@sigmavirus24 what leads you to believe the slash is "redundant and superfluous"? I see this is not an issue that would affect many people, so I'm willing to contribute a fix that would work in line with other clients. Other options I have are no joy - staying on 2.31, or maintaining a fork, or using Python stdlib for this particular request.

SpoonMeiser commented 3 months ago

For what it's worth, the issue I raised against boto got rejected, so it appears the python SDK developers are not planning to address this in the SDK.

Instead, the suggestion is for when using requests, to use a prepared request and escape the first slash. I have not tested this myself.