opensearch-project / opensearch-go

Go Client for OpenSearch
https://opensearch.org/docs/latest/clients/go/
Apache License 2.0
194 stars 100 forks source link

[BUG] AWS Signing Requests Broken when URL has ports (SSH Tunnel) #370

Open TheFynx opened 1 year ago

TheFynx commented 1 year ago

What is the bug?

When having to sign OpenSearch requests (i.e.; AWS Hosted OpenSearch with IAM Auth), it only works with non-port URLs. At least with the awsdkv2 signer as it's the only one I've tested/used.

Somewhere there is a disconnect and the port is not getting removed somewhere or being removed when it shouldn't be on the URL passed to the Sign requests.

I have tested this with reverse proxies, sshuttle, on the bastion itself, etc... Everything works except when a URL has a port in it then there is a signature error.

How can one reproduce the bug?

Failure

Success

To be able to use any OpenSearch URL to the library and be able to utilize it with Signing Request

The following should work without modifications on my end

What is your host/environment?

PopOs! 22.04

Do you have any screenshots?

No screenshots, but here is my output showing the difference

Request Headers (truncated):

GET /_cat/indices?format=json&human=true&pretty=true HTTP/1.1
Host: localhost:34211
User-Agent: opensearch-go/2.3.0 (linux amd64; Go 1.21.0)
Authorization: AWS4-HMAC-SHA256

Response (truncated):

GET http://localhost:34211/_cat/indices?format=json&human=true&pretty=true
403 Forbidden 1.311s
The request signature we calculated does not match the signature you provided. Check your AWS Secret Access Key and signing method. Consult the service documentation for details.

The Canonical String for this request should have been

GET /_cat/indices\nformat=json&human=true&pretty=true
host:localhost

Do you have any additional context?

N/A

dblock commented 1 year ago

There are some fixes related to signing on main, would you care to try and reproduce this with the latest code, posting a detailed repro, and we can debug from there?

TheFynx commented 1 year ago

Still running into the same issue, I'll just go with an abundance of info

TheFynx commented 1 year ago

So, just tested some changes to the signer. Just doing this works... but I'm not sure if this is the right solution

Host is actually empty in the request passed here, it's taking r.URL which has the port in it. If you set Host, then no issues. (I already tried setting the Host in the Header option of the client but it didn't make it's way to the Signer)

func (s *awsSdkV2Signer) SignRequest(r *http.Request) error {
    ctx := context.Background()
    t := time.Now()

    // Extract just the hostname part
    r.Host = strings.Split(r.URL.Host, ":")[0]

    creds, err := s.awsCfg.Credentials.Retrieve(ctx)
    if err != nil {
        return err
    }

    if len(s.awsCfg.Region) == 0 {
        return fmt.Errorf("aws region cannot be empty")
    }

    hash, err := hexEncodedSha256OfRequest(r)
    r.Header.Set("X-Amz-Content-Sha256", hash)

    if err != nil {
        return err
    }

    return s.signer.SignHTTP(ctx, creds, r, hash, s.service, s.awsCfg.Region, t)
}
dblock commented 1 year ago

That r.Host updates the host value during signing, which looks suspicious. It's probably not the right fix. But looking at this, what's the value of r.URL here? Is the signer supposed to ignore the port? (I don't think so)

TheFynx commented 1 year ago

The r.URL was returning https://localhost:39163/_cat/indices?format=json&human=true&pretty=true and it would set the Host as localhost:39163 when it signed.

However, AWS is looking for localhost as that's what is being reported as the host. I assume this is somewhere on their end that they don't want a port reported with a hostname because when you force r.Host to just be localhost everything works.

dblock commented 1 year ago

This use-case of proxying through a local SSH tunnel is a bit unusual. I am pretty sure that if an AWS service were to run on a non-default port, the port must be present in the host header. So I'm pretty sure that if an AWS service ran on a non-standard port, you'd be required to include the port when calculating the signature. This is also interesting: https://github.com/aws/aws-cli/issues/2883 pretty.

So I don't think we should be stripping the port for an unusual case like this unless we're 100% convinced it's the right thing to do and that it doesn't introduce regressions. If you want to hang in here with me, I'd want to know whether any of the following fix/exhibit the same problem:

  1. Another client, e.g. awscurl and/or opensearch-py.
  2. Give localhost a different, non-local looking name.
  3. Try with 127.0.0.1 or a non-loopback IP.
dblock commented 1 year ago

@TheFynx thinking about this more, the actual service port is 443, but you're signing requests with port 39163 because of your proxy, so that fails. I think that's expected, the port is incorrect.

The other question is why host=localhost fine where you're actually talking to ...us-west-2.es.amazonaws.com, I'll talk to the server team. I'd expect your workaround of stripping the port to fail too.

dblock commented 1 year ago

I tried to reproduce this but couldn't get a tunnel that would forward HTTPs requests to work.

  1. I have an AOSS collection, e.g. xyz.us-west-2.aoss.amazonaws.com:443
  2. I can awscurl --service=aoss --region $AWS_REGION https://xyz.us-west-2.aoss.amazonaws.com/_cat/indices successfully.
  3. What combination of sshuttle or ssh do I run locally to do awscurl --service=aoss --region $AWS_REGION https://localhost:1234/_cat/indices?
TheFynx commented 1 year ago

@dblock I'm using a AWS hosted OpenSearch in a private VPC, not AOSS. I'm actually pretty sure AOSS can only be public, so to tunnel you'd have to set up a private link, a VPC, and a bastion to ssh into just to test that.

This is the docs from AWS on how to access an OpenSearch Cluster in a VPC https://docs.aws.amazon.com/opensearch-service/latest/developerguide/vpc.html#vpc-test

Tunneling is a pretty standard practice for accessing private resources when you don't have a VPN into your private VPC.

I can't use awscurl, it doesn't support the AWS sso-session configs (let alone any SSO access I think https://github.com/okigan/awscurl/issues/114) which is why I'm using the opensearch-go library to add the features I need into our own local dev cli.

dblock commented 1 year ago

Ok, I was thrown off by "AWS Hosted OpenSearch with IAM Auth", I thought you meant the Amazon Managed OpenSearch Service. So your OpenSearch hosted on an AWS EC2 instance runs on port 443, but your tunnel listens on port 39163? It seems to make sense that if you sign with port 39163 it doesn't work, it's the wrong port. And stripping the port works because 443 is a default port for HTTPS.

So we're back to questioning whether a feature that allows to override the value of host:port for AWS Sigv4 signing is needed. I think the answer to this is "no", this doesn't seem like a realistic production scenario (doing Sigv4 behind an authenticated SSH tunnel). But I'm open to hearing whether other clients, and the AWS SDK, support this use-case, and how.

Would it be a workaround to run the tunnel on the same port as OpenSearch? So localhost:443 or run OpenSearch on port 39163?

TheFynx commented 1 year ago

I am using an AWS Managed OpenSearch Service... it's just with VPC enabled. So it's all my private IPs being used via Amazon's Service. Amazon has 3 OpenSearch managed options. Normal Cluster/Domain, Ingestions, and Serverless. We're using the normal cluster/domain just set up with the VPC options (https://docs.aws.amazon.com/opensearch-service/latest/developerguide/vpc.html)

So the port doesn't matter for localhost, it's random and it's just a proxy port to 443 on the other end

So ssh -i ~/mykeypair.pem -N -L 9200:#####.us-west-2.es.amazonaws.com:443 ubuntu@ec2-###-##-##-###.compute-1.amazonaws.com

Means

So all local requests are going to https://#####.us-west-2.es.amazonaws.com but the origin is still localhost.

I also found this in your Terraform module as I am currently converting it to work with Pulumi, this specifically calls for you to override the Host when doing an SSH Tunnel. So it seems like this is a known thing and is handled the way I mentioned above, needing to strip your local port as the signing request only shows host and not localhost:$port.

I haven't tested over-writing with over-riding localhost:$port to #####.us-west-2.es.amazonaws.com since just doing localhost worked for my purposes but I can once I have another cluster up and running.

https://registry.terraform.io/providers/opensearch-project/opensearch/latest/docs#connecting-to-a-cluster-via-an-ssh-tunnel

dblock commented 1 year ago

Get an awscurl request to work with this setup, and we can see how we should alter the client if at all.

thecjharries commented 1 year ago

@dblock awscurl does not work with AWS SSO. See awscurl #114 as linked by @TheFynx here.

dblock commented 1 year ago

@dblock awscurl does not work with AWS SSO. See awscurl #114 as linked by @TheFynx here.

Any other tool that supports SigV4 that you can make work? I just want to see how others implement support for switching hosts.

TheFynx commented 1 year ago

This was brought up before in the opensearch python client, this is the solution proposed there

There is this project from AWS, https://github.com/awslabs/aws-sigv4-proxy

Found an example of how you have to do it with awscurl and neptune, doing a host overwrite

awscurl -k --service neptune-db --access_key $ACCESS_KEY --secret_key $SECRET_KEY --region <neptune_instance_region> --session_token $SESSION_TOKEN --header 'host: <neptune-cluster-endpoint-withouthttp-withoutport>' https://localhost:8182/status
dblock commented 1 year ago

@TheFynx Great! All these allow users to override any amount of headers, without specifically doing anything about the host or port. I would merge a change that allows to override headers, and thus to specifically override the host header.