SNI server name not updated on redirects causing handshake failure

Rich5 commented 3 years ago

Steps to reproduce:

echo 148.62.23.244,granite-logistics.com | ./zgrab2 http --max-redirects 10 --retry-https

Results:

INFO[0000] started grab at 2021-02-24T00:23:24Z
{
  "ip": "148.62.23.244",
  "domain": "granite-logistics.com",
  "data": {
    "http": {
      "status": "unknown-error",
      "protocol": "http",
      "result": {},
      "timestamp": "2021-02-24T00:23:24Z",
      "error": "tls: oversized record received with length 20527"
    }
  }
}
INFO[0006] finished grab at 2021-02-24T00:23:30Z
{"statuses":{"http":{"successes":0,"failures":1}},"start":"2021-02-24T00:23:24Z","end":"2021-02-24T00:23:30Z","duration":"6.184948073s"}

I see references to this error in various places in other projects so it makes me think it may be in some deep library, but any idea why this is happening? The only other reference in zgrab2 is here issue from 2018.

Rich5 commented 3 years ago

As an update if I remove --retry-http there are a series of redirects ultimately leading to cloudflare CDN and end with the error

"error": "remote error: handshake failure"

But still doesn't explain why retrying with https give the tls oversized error.

mzpqnxow commented 3 years ago

@Rich5 thanks for sending this in, you actually found a really nice bug that may be having a big impact across large datasets. The workaround for now is to use --no-sni. If you want to read in more detail feel free but that's all you need to know. If you don't mind, can you rename this issue something like "SNI server name not updated on redirects causing handshake failure", so it's clear by the title what the root cause is?

There are a few things going on here. It seems that SNI is the root cause of everything. The bug (specifically) is that zgrab2 is using SNI by default- but in the case of redirects from one SSL endpoint to another, the SNI server name value does not get updated after the first redirects- so for servers that are using SNI, you'll either get the wrong certificate or (in this case) the handshake will fail completely

The invalid length issue is expected behavior, but I understand why it's confusing you, or seems wrong, if you're curious

For your case, the 301/302 chain goes:

http://granite-logistics.com:80 -> https://granite-logistics.com:443 -> https://www.granite-logistics.com:443

The handshake is failing on the last one because it's sending an SNI server name of "granite-logistics.com" when it should be sending an SNI server name of "www.granite-logistics.com"

The SSL message length error is the result of zgrab2 expecting an SSL handshake from a plaintext server, because the SNI issue caused the first request chain to fail. To see what I mean, compare what an SSL server hello hello looks like vs. a valid SSL server hello:

00000000: 4854 5450 2f31 2e31 0a HTTP/1.1. # HTTP/1.1 <response code> response 00000000: 1603 0300 5d02 0000 59 ....]...Y # SSL server hello, the response to SSL client hello

This is obvious if you convert the decimal error message from zgrab2 to hex (and then print it as ASCII:

printf "%x\n" 20527  # 20527 as hex
502f
printf "%c%c\n" 0x50 0x2f   # 0x502f as ASCII printable
P/

Whether this should be the behavior or not is debatable- it probably shouldn't. But I think this may be expected. The initial request (in HTTP mode) went to granite-logistics.com:80 and got a successful redirect. but it's considered failed because of the final handshake failure caused by the SNI issue described below. Because of that, it invokes the --retry-https logic to the same endpoint (granite-logistics.com:80) which obviously does not expect an SSL/TLS handshake. At that point, zgrab2 receives the plaintext HTTP response and interprets it as an invalid SSL protocol message, hence the invalid length error.

@zakird, @dadrian I initially thought this was going to be a rather uninteresting bug (or perhaps even user error- sorry @rich5 :) - but it looks like there is a legitimate issue with the logic that handles SNI. It seems that the SNI server name is not updated in some cases- at the very least, it's not updated when a redirect goes from HTTP->HTTPS->HTTPS. It could be that it's not updated for any redirects- I'm not sure yet, haven't had time to test. Here's the best I can do to illustrate the issue:

This is the last hop, where the SNI error occurs:

› host www.granite-logistics.com
www.granite-logistics.com is an alias for 5109730.group30.sites.hubspot.net.
5109730.group30.sites.hubspot.net is an alias for group30.sites.hscoscdn30.net.
group30.sites.hscoscdn30.net has address 199.60.103.227

echo 199.60.103.29,www.granite-logistics.com | zgrab2 http --max-redirects 10 -p 443 --use-https - WORKS, as expected echo 199.60.103.29,www.granite-logistics.com | zgrab2 http --max-redirects 10 -p 443 --use-https --no-sni - WORKS, as expected echo 199.60.103.29,www.granite-logistics.com | zgrab2 http --max-redirects 10 -p 443 --use-https --server-name=www.granite-logistics.com - WORKS, as expected openssl s_client -connect 199.60.103.227:443 -servername www.granite-logistics.com - WORKS, as expected

The full 301 chain when requesting http://granite-logistics.com is:

[0] http://granite-logistics.com
[1] https://granite-logistics.com:443
[2] https://www.granite-logistics.com:443

echo 148.62.23.244,granite-logistics.com | ./zgrab2 http --max-redirects 10 --retry-https - FAIL, handshake failed on the last redirect echo 148.62.23.244,granite-logistics.com | ./zgrab2 http --max-redirects 10 --retry-https --no-sni - WORKS, as expected

I checked the wire for the failure case- the first two hops of the chain succeeded, but the the HTTPS negotiation on the last hop (https://www.granite-logistics.com:443) failed because the SNI server name was set to "granite-logistics.com" (the value for the previous hop) instead of "www.granite-logistics.com"

So in this case, the endpoint respects SNI, but it's optional. If SNI is present and incorrect, it kills the handshake. I assume this is probably a very common configuration and only becoming more common as users transition to CDNs that support SNI

The final example, just showing it will not take an invalid/incorrect/unknown SNI name: openssl s_client -connect 199.60.103.227:443 -servername www.granite-logistics.com - WORKS openssl s_client -connect 199.60.103.227:443 -servername granite-logistics.com - FAILS, handshake failed

It seems to me that the problem is the SSL context needs to be set to match the Location before each redirect occurs- but that's not happening. The SNI server name gets set once, when the target is first contacted using HTTPS- and never updated after that. Is this a golang issue or a zgrab2 issue?

I'm not at all familiar with the SNI code (or even the SSL/TLS code to be honest) maybe you have some idea about how simple or difficult this may be to fix?

Another thought- might it make more sense to make the default configuration --no-sni instead of using SNI by default? I don't know when SNI was added, or if there was a strong case made to use it by default- just thinking out loud

I'm going to try to get some measurements of successful/failed handshakes with --no-sni vs default, I'm very curious how much of an issue this is in practice across many hosts

mzpqnxow commented 3 years ago

@Rich5 this turned out to be very easy to fix. If you would like to test out the patched version, you can build from this fork+branch until the fix is merged into master:

https://github.com/mzpqnxow/zgrab2/tree/issue-300-redirect-sni

It turned out to be a pretty easy fix, though it probably needs a second set of eyes- I did confirm that it works and doesn't seem to break anything

Rich5 commented 3 years ago

@mzpqnxow thanks so much for looking into this! I went ahead and changed the issue name as requested and we'll give the branch a try.

Rich5 commented 3 years ago

Ok I tried your branch and it does seem to fix the error; however CloudFlare returns a 403 Forbidden which means that while zgrab is not throwing an error we may still not be able to use it for this type of case. Your thoughts?

Also with respect to the --no-sni workaround. It does prevent the error but it also causes browser_trusted to return false. I believe it's because with some CDNs like Cloudflare it requires the SNI to work correctly.

mzpqnxow commented 3 years ago

Ok I tried your branch and it does seem to fix the error; however CloudFlare returns a 403 Forbidden which means that while zgrab is not throwing an error we may still not be able to use it for this type of case. Your thoughts?

Also with respect to the --no-sni workaround. It does prevent the error but it also causes browser_trusted to return false. I believe it's because with some CDNs like Cloudflare it requires the SNI to work correctly.

Interesting, it's not immediately clear to me what's causing this- the case with the patched branch and SNI enabled, I mean

I thought maybe it was flagging the user-agent but that doesn't seem to be it.. it's going to require some trial and error to narrow down exactly what behavior on the client-side is causing CloudFlare to not like it...

This is what I reproduced the 403 with- you probably noticed the failure is still on the last hop (to https://www.granite-logistics.com) and it fails regardless of whether it came via a redirect

echo www.granite-logistics.com | ./zgrab2 http --max-redirects 10 --use-https -p 443 --user-agent='curl/7.64.0' -o granite --cipher-suite=chrome-only

Using curl gives a 200:

curl --http1.1 -vvvv https://www.granite-logistics.com -o /dev/null

One thing that jumps out at me (and is the cause of the browser_trusted to fail) is that zgrab2 is not seeing the correct certificate, it's seeing:

                        "subject_alt_name": {
                          "dns_names": [
                            "*.sites-proxy.hscoscdn30.net",
                            "hscoscdn30.net",
                            "*.hscoscdn30.net",
                            "sites-proxy.hscoscdn30.net"
                          ]

I will try to take a deeper look into when I get a chance. I first guessed it was an issue with the SNI change but I checked a packet capture again just to be sure and the SNI seems fine. I also checked that the HTTP headers curl was using to get a 200 response were exactly the same as the test with zgrab2. I'm guessing there's some subtle difference in the zcrypto ssl/tls behavior that Cloudflare does not like- and curl does not behave in that way.

I confirmed it wasn't a cipher-suite or protocol version issue- I used zgrab2 to negotiate TLS_ECDHE_ECDSA_WITH_AES_128_GCM_SHA256 over TLSv1.2 and it failed as expected- this results in the incorrect certificate coming back and an HTTP 403-

echo www.granite-logistics.com | ./zgrab2 http --max-redirects 10 --use-https -p 443 --user-agent='curl/7.64.0' -o granite --cipher-suite=chrome-only --min-version=0x0302 --max-version=0x0302

Then I used the same SSL parameters with openssl- but got the correct certificates back and an HTTP 200 response:

perl -e 'print "GET / HTTP/1.1\r\nHost: www.granite-logistics.com\r\n\r\n\r\n' | ~/testssl.sh/bin/openssl.Linux.x86_64 s_client -connect www.granite-logistics.com:443 -servername www.granite-logistics.com -debug -showcerts -tls1_2 -cipher ECDHE-ECDSA-AES128-GCM-SHA256

I'm very interested in why this isn't working but don't have time to look deeper at the moment- but I will try to get back to it soon. Maybe it's something very simple- it will be easier to tell when I have time to dig into the protocol exchange in detail

mzpqnxow commented 3 years ago

@Rich5 you may find it helpful to search for golang bug reports- because the zcrypto library uses an older (modified) version of golang for the ssl/tls, it's possible this issue may have been reported elsewhere and fixed. If you can track down the cause and (ideally) the patch that was used from a golang bug report, it will make this a lot simpler

Rich5 commented 3 years ago

Previously I was searching for a similar error to "tls: oversized record received with length" and it did come up mostly related to Kubernetes and Docker issues. Most of the answers had to do with a misconfigured proxy which I don't think it our issue here, but who knows.

I'll dig around some more and see what I can find.

mzpqnxow commented 3 years ago

On a somewhat related note, I think this is a good example use-case for either changing the default behavior or adding a new flag- not for the SNI issue specifically, but for when there is a failure on one of the redirects.

It would be similar to the --redirects-succeed option that is already implemented. The --redirects-succeed option allows the request to be considered as a success, which short-circuits the --retry-https logic and allows capture of the responses from all of the requests leading up to exhaustion of the redirect maximum.

In this case, if there was a --redirect-sslerror-succeed, the invalid length error would not occur. I think if the first request gets a valid HTTP response back, there should be no retry as https- even if there is a handshake failure later on in the redirect response chain.

I'm just not sure if this should be default behavior or added as an optional flag. I'll open a separate issue for this

mzpqnxow commented 3 years ago

@Rich5 that branch should now completely fix the issue if you'd like to try again

There was a mistake my initial change that cause the SNI server name to include the port, which was causing a different type of failure, that HTTP 403. The corrected patch strips the port before setting it as the SNI server name and everything works properly l

Rich5 commented 3 years ago

Thanks! I'll test this out today.

Rich5 commented 3 years ago

Confirmed your patch fixes the problem. Thanks!

mzpqnxow commented 3 years ago

Thank you @Rich5 for bringing this up and thanks @dadrian for reviewing and merging the fix

@Rich5 I think you can close this now, the fix is in master

zmap / zgrab2

SNI server name not updated on redirects causing handshake failure #300