tedder / requests-aws4auth

Amazon Web Services version 4 authentication for the Python Requests module
MIT License
179 stars 63 forks source link

New library is broken #65

Open leolorenzoluis opened 1 year ago

leolorenzoluis commented 1 year ago

The updates for the netloc is broken. It doesn't consider other ports besides 443/80 which makes it broken and computes the wrong signature.

tedder commented 1 year ago

What service and URL are you using that is breaking?

tedder commented 1 year ago

Released as v1.2.2. Leaving this ticket open for more details.

phillipberndt commented 1 year ago

Interesting. For me, it's the other way around. And obviously it's the same for others - in the original ticket multiple people claimed that it didn't work before, whereas in this ticket it is stated that the change is what breaks things in the first place!?

Looking at the verbose headers e.g. via Wireshark, without the reverted commit, I can clearly see a host header with a port is sent for non-default ports, whereas the generated header used by this library is used for computing the signature, making it invalid.

I suspect that the mocking library or whatever else you're using this against has the same bug as this library had. So yeah, would be interesting to take a look at the broken endpoints.

leolorenzoluis commented 1 year ago

@tedder I'm using 'es', and just doing a url of https://localhost:9200. This is a case of doing ssh tunnel against a managed OpenSearch cluster which is in port 443.

@phillipberndt I believe the logic was replaced from the original code. Before, it was separating the port and domain name. However, the current one includes the port as part of the signing request when it should not.

phillipberndt commented 1 year ago

aws4 signatures must be computed from the headers in the request, and must include the Host header. See https://docs.aws.amazon.com/general/latest/gr/create-signed-request.html.

The reason for having special code for ports 80 and 443 is that the low level HTTP library used under the hood by the requests library generates the Host header after invoking the authenticator, forcing this library to compute it upfront, and the underlying implementation includes the port in the Host header if it is not 443 for https or 80 for http.

You transparently forward from port 9200 to port 443 and your ES endpoint, without making requests aware of the fact. That means that it generates a wrong Host header, which the ES service is apparently willing to ignore. The bug in this library caused the wrong port number (9200 rather than 443) of the Host header to be stripped when computing the signature, and that caused things to work for you. But you still relied on a bug.

The proper thing to do would be for you to explicitly set a Host header with the redirect target (your ES endpoint) when submitting the request. The library will then not attempt to build a host header from the URL of the request, but rather use yours. (This is normally always required when doing HTTPS requests though SSH port forwarding. Check with SSH forwarding to a random smaller website requiring SNI: Fetching through https://localhost:9200 without explicitly overriding the Host header will usually not work, because the target server won't accept the unexpected Host header / SNI field.)

phillipberndt commented 1 year ago

So, what's the path forward, @tedder? As I see it, there's two options, and which of the two to go with is a design decision you'll have to make:

Either you decide to prioritize backwards compatibility over correctness in edge cases. In that case, reopen #34, and add a comment there and in the documentation stating that ports are stripped from the host header before signing for historic reasons. Or decide to reapply my change, and leave this ticket unresolved.

Both positions make sense, and which is better really depends on what audience you target with this library.

I don't think we need API changes to support both use cases. Anyone can work around their respective undesired behavior by just explicitly setting the Host header already. And this doesn't affect users who just make normal calls against AWS without any fiddling anyway.

tedder commented 1 year ago

We need unit tests that show how this works and doesn't work.

phillipberndt commented 1 year ago

So the idea is to have another unit test on top of my change to verify that the port is included if it is not the default port?

Explicitly set Host headers and default port behavior were already covered.

Do you also want a test to cover that urllib3 generates the Host header with/without port depending on the port number? That's probably only possible though monkey patching and relying on internals of the library though. An alternative to this would be to have this library explicitly set the Host header if it's not present..

phillipberndt commented 1 year ago

Happy to contribute these, waiting for your opinion before taking the next steps..

tedder commented 1 year ago

hey @phillipberndt, what we really need at this point are tests that give coverage to what is working, what isn't, and what gets broken in the regression. I know the tests won't all succeed right now, which is fine.

phillipberndt commented 1 year ago

That won't change that at the end of the process either this bug or the other one will remain open indefinitely. Either we change the library such that host header in the request and in the canonical request string match or we don't.

I'd like to know upfront which of the two you're leaning towards before investing more time, because all I really need is a library where they match, and if that's not the goal you envision I'll be better off spending that time looking for alternatives :)

tedder commented 1 year ago

for sure, there will be failing tests. I think your approach is correct- change it to match- but it's hard to communicate it.

After your change is made I may be able to add a config to fix the regression.

phillipberndt commented 1 year ago

Finally found time for this. PR with updated unit tests is open, the failing one is exactly the scenario described here.

tedder commented 1 year ago

@leolorenzoluis can you check out Phillip's unit test and make sure it completely covers your use case?

phillipberndt commented 1 year ago

The currently failing test is the negation of his use case - if a URL uses a non-standard port, then this library currently does not construct the string to sign correctly, with Host header sent with the request mismatching from the copy in the signature string. In my version of the test that causes the test to fail, he would have written it to succeed the test in that case.