httprb / http

HTTP (The Gem! a.k.a. http.rb) - a fast Ruby HTTP client with a chainable API, streaming support, and timeouts
MIT License
3k stars 321 forks source link

Some sites don't work? #710

Closed Overload119 closed 2 years ago

Overload119 commented 2 years ago

Try Sephora.

Tried this on 5.0 and 4.4:

require 'net/http'
require 'uri'

uri = URI.parse("https://www.sephora.com/")
request = Net::HTTP::Get.new(uri)
request["User-Agent"] = "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/99.0.4844.84 Safari/537.36"

req_options = {
  use_ssl: uri.scheme == "https",
}

response = Net::HTTP.start(uri.hostname, uri.port, req_options) do |http|
  http.request(request)
end
#  => #<Net::HTTPOK 200 OK readbody=true>

Same thing with Curl:

curl 'https://www.sephora.com/' \
  -H 'user-agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/99.0.4844.84 Safari/537.36' \
  --compressed
# Body content is returned.

Same thing with HTTP.rb:

(
  headers = {
    'User-Agent' =>
      'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/99.0.4844.84 Safari/537.36',
  }
  HTTP.headers(headers).get('https://www.sephora.com/')
)
# Timeout error
ixti commented 2 years ago

I believe it's related to OpenSSL. I can't make net/http example to work either.

tarcieri commented 2 years ago

Likewise. It hangs for me with both Net::HTTP and http.rb

tarcieri commented 2 years ago

FWIW, here's a user reporting something similar with Python when accessing https://sephora.fr

https://stackoverflow.com/questions/71459063/scrapy-now-timesout-on-a-website-that-used-to-work-well

They make it sound like something that changed on the remote side, and suggested it might be related to Connection: close (but that sounds like a guess).

I get the same hang behavior with both Net::HTTP and http.rb when attempting to make a request to https://sephora.fr

tarcieri commented 2 years ago

For me it's reproducible with the openssl CLI:

$ openssl s_client -connect www.sephora.com:443                                                                                                       130 ↵
CONNECTED(00000005)
depth=2 C = US, O = DigiCert Inc, OU = www.digicert.com, CN = DigiCert Global Root CA
verify return:1
depth=1 C = US, O = DigiCert Inc, CN = DigiCert TLS RSA SHA256 2020 CA1
verify return:1
depth=0 C = US, ST = California, L = San Francisco, O = "Sephora USA, Inc.", CN = *.sephora.com
verify return:1
---
Certificate chain
 0 s:/C=US/ST=California/L=San Francisco/O=Sephora USA, Inc./CN=*.sephora.com
   i:/C=US/O=DigiCert Inc/CN=DigiCert TLS RSA SHA256 2020 CA1
 1 s:/C=US/O=DigiCert Inc/CN=DigiCert TLS RSA SHA256 2020 CA1
   i:/C=US/O=DigiCert Inc/OU=www.digicert.com/CN=DigiCert Global Root CA
---
Server certificate
-----BEGIN CERTIFICATE-----
MIIGxTCCBa2gAwIBAgIQDEZ5kGHOg/rtqX+YhmjfLjANBgkqhkiG9w0BAQsFADBP
MQswCQYDVQQGEwJVUzEVMBMGA1UEChMMRGlnaUNlcnQgSW5jMSkwJwYDVQQDEyBE
aWdpQ2VydCBUTFMgUlNBIFNIQTI1NiAyMDIwIENBMTAeFw0yMjAzMDgwMDAwMDBa
Fw0yMzAzMDgyMzU5NTlaMG4xCzAJBgNVBAYTAlVTMRMwEQYDVQQIEwpDYWxpZm9y
bmlhMRYwFAYDVQQHEw1TYW4gRnJhbmNpc2NvMRowGAYDVQQKExFTZXBob3JhIFVT
QSwgSW5jLjEWMBQGA1UEAwwNKi5zZXBob3JhLmNvbTCCASIwDQYJKoZIhvcNAQEB
BQADggEPADCCAQoCggEBAMi02zgEkla4te/hNDQ0tCmPf+54B9CCh6yXE9VS4CVV
voB1kbo41KqPEWuxzEGLuziRgP/aWiDWoZHR2v/WKI+Lut8N7xBuSRg7e74QHH1v
cozoIUa339oRZmUJW87J6lFnFZh2CMPHYKQW6cgz3tnlDDTvSpS2BMfjQ+7DrWhe
PxPxLsP4OTDZF8PQjnlJ3n4XbVDO93UFOGe3h/e9nAxLXSuYFov9CneCg+aZb5GE
V9wM9abymvgo8iEH1gVBMnpZv/ZgfQgZv7mhtN7i5MSvA4cMFSGfgYd7/Qsj4ggg
1p1+vYX59TBvCJIz2rjPWeCOimZNxgDfXjI+68Gp6VMCAwEAAaOCA3wwggN4MB8G
A1UdIwQYMBaAFLdrouqoqoSMeeq02g+YssWVdrn0MB0GA1UdDgQWBBS6Wm8V9Xhe
jvEZ8ID/d7MZjyazmjAlBgNVHREEHjAcgg0qLnNlcGhvcmEuY29tggtzZXBob3Jh
LmNvbTAOBgNVHQ8BAf8EBAMCBaAwHQYDVR0lBBYwFAYIKwYBBQUHAwEGCCsGAQUF
BwMCMIGPBgNVHR8EgYcwgYQwQKA+oDyGOmh0dHA6Ly9jcmwzLmRpZ2ljZXJ0LmNv
bS9EaWdpQ2VydFRMU1JTQVNIQTI1NjIwMjBDQTEtMi5jcmwwQKA+oDyGOmh0dHA6
Ly9jcmw0LmRpZ2ljZXJ0LmNvbS9EaWdpQ2VydFRMU1JTQVNIQTI1NjIwMjBDQTEt
Mi5jcmwwPgYDVR0gBDcwNTAzBgZngQwBAgIwKTAnBggrBgEFBQcCARYbaHR0cDov
L3d3dy5kaWdpY2VydC5jb20vQ1BTMH0GCCsGAQUFBwEBBHEwbzAkBggrBgEFBQcw
AYYYaHR0cDovL29jc3AuZGlnaWNlcnQuY29tMEcGCCsGAQUFBzAChjtodHRwOi8v
Y2FjZXJ0cy5kaWdpY2VydC5jb20vRGlnaUNlcnRUTFNSU0FTSEEyNTYyMDIwQ0Ex
LmNydDAMBgNVHRMBAf8EAjAAMIIBfwYKKwYBBAHWeQIEAgSCAW8EggFrAWkAdgDo
PtDaPvUGNTLnVyi8iWvJA9PL0RFr7Otp4Xd9bQa9bgAAAX9reZUsAAAEAwBHMEUC
IQCWOLNpkNQk3kleg4XmYg2Gleq/NIRxRPjH030Pdt7xFgIgMlwvaABB79cbIc7n
t3FAEMmC48+FWatC/kzds0hn2OYAdgA1zxkbv7FsV78PrUxtQsu7ticgJlHqP+Eq
76gDwzvWTAAAAX9reZVIAAAEAwBHMEUCIQDP3bHQAHsY5gYswsr18yPbYLE2gBA4
9uqys0k3j01NfAIgQkTRsFC0rn8xBK13STYpm/XxwU9j5WxO/BF07yxxBE4AdwCz
c3cH4YRQ+GOG1gWp3BEJSnktsWcMC4fc8AMOeTalmgAAAX9reZV7AAAEAwBIMEYC
IQCcTn3aluRrG9SWcaDSoS6mKDSrGvAFXy/Gaoqj1t5UTQIhAKbmHhPQot7p7cEf
u1rsap5+1mr3/lRWmyZhfQ3nEhbmMA0GCSqGSIb3DQEBCwUAA4IBAQC0XA/BOBfj
RIJ1s4EGLfUk3DmD2NV3V2IP65L7IqeKGQTsKzkSpPd9kGLSK26N1D9TAe+3BDfp
axSw/j1c/Il1MI6zWEaI15YMQVaWrbQm5wrPUfbCmlQHz0vsrOzNS14hugz4q7uH
vQPWTPmVUq5NjOpYW8Myks9/b4UIz8CUTg1QL6MwmTszMkZooFkzYAEVS5xrf05E
FFs2Q7h4yeKtg1WEDF1vNDDcI3EmU7/6a6TzEM+d7OdYMORArOSX68R7qMmR9vFi
qrL5x5/s1MocTjlfFcvIy/nNLNqVgMg7PFcXMQkO6rhOvUXV8Bvhp2BvErV4sktP
sTg7wMV1KxRO
-----END CERTIFICATE-----
subject=/C=US/ST=California/L=San Francisco/O=Sephora USA, Inc./CN=*.sephora.com
issuer=/C=US/O=DigiCert Inc/CN=DigiCert TLS RSA SHA256 2020 CA1
---
No client certificate CA names sent
Server Temp Key: ECDH, P-256, 256 bits
---
SSL handshake has read 3648 bytes and written 314 bytes
---
New, TLSv1/SSLv3, Cipher is ECDHE-RSA-CHACHA20-POLY1305
Server public key is 2048 bit
Secure Renegotiation IS supported
Compression: NONE
Expansion: NONE
No ALPN negotiated
SSL-Session:
    Protocol  : TLSv1.2
    Cipher    : ECDHE-RSA-CHACHA20-POLY1305
    Session-ID: DB3AA662176396813001F451A8C4C131CDDD6B922D0EBBD1C5C8B7D80A4B12AB
    Session-ID-ctx:
    Master-Key: 250188875E83EB4B39D46AF40C77C547C292C5EF1F9E78965443ABA117A3778318110290DDED978CD2295E45B6037EB3
    TLS session ticket lifetime hint: 83100 (seconds)
    TLS session ticket:
    0000 - 00 00 0d f0 20 12 3b a0-a7 ed e3 57 b4 ae f8 d6   .... .;....W....
    0010 - 42 b2 61 2d a6 e2 85 6a-af e0 42 f1 19 73 3c 7d   B.a-...j..B..s<}
    0020 - 71 35 df 97 c4 36 44 ed-c2 77 b7 5d 94 49 24 62   q5...6D..w.].I$b
    0030 - 5b 9b 2d 37 70 ee e8 02-a6 9b 03 a1 99 5a 98 95   [.-7p........Z..
    0040 - 7d 43 39 1a 36 5d c4 5b-ba 61 75 e6 82 6f 54 e9   }C9.6].[.au..oT.
    0050 - 04 35 86 5b fb ef 9f 57-82 75 ba f7 e5 74 04 2a   .5.[...W.u...t.*
    0060 - 3e fb 3a e7 11 b9 af 5f-f3 bb 63 a2 75 d4 b4 68   >.:...._..c.u..h
    0070 - 56 2f 4d b2 80 cf 4e 59-df 22 a9 d0 c3 e0 bb 54   V/M...NY.".....T
    0080 - 66 3d 46 d8 e6 f3 59 43-b1 66 7e 96 31 d8 87 4e   f=F...YC.f~.1..N
    0090 - 5c 28 04 cb f5 b6 ec 72-c9 a6 57 21 be 0b 4f 47   \(.....r..W!..OG

    Start Time: 1655756843
    Timeout   : 7200 (sec)
    Verify return code: 0 (ok)
---
GET / HTTP/1.1
Host: www.sephora.com:443
Connection: close
User-Agent: http.rb/5.1.0

...hangs indefinitely.

tarcieri commented 2 years ago

Based on that I'm going to close this as being what appears to be a problem with OpenSSL (or possibly an interaction between OpenSSL and the remote TLS stack).

Please reopen if you can provide a reproduction that narrows this down to http.rb

tarcieri commented 2 years ago

SSL Labs wasn't able to make an HTTP request either:

https://www.ssllabs.com/ssltest/analyze.html?d=www.sephora.com#httpRequests

Screen Shot 2022-06-20 at 2 32 59 PM

Definitely seems like an issue with that site.

ixti commented 2 years ago

@tarcieri one thing makes me wonder though: curl example works perfectly fine, firefox opend that URL without any issues, and httpie too.

tarcieri commented 2 years ago

If it can be reproduced with the openssl CLI, Python, and SSL Labs it is clearly not an http.rb issue.

I'm not sure why curl works as it's ostensibly using OpenSSL as well. Chrome and Firefox work but do not use OpenSSL.

ixti commented 2 years ago

I was able to fix it though :rofl: All we need to do is to ensure we send Connection: keep-alive header. We do that with HTTP.persistent, but we send it as Keep-Alive which is good for the majority of servers, but not this one… Here's working example:

require "bundler/inline"

gemfile do
  source "https://rubygems.org"

  gem "http"
end

module HTTP
  class Connection
    KEEP_ALIVE = "keep-alive"
  end
end

HTTP.persistent("https://www.sephora.com") do |http|
  puts http
    .use(:auto_inflate)
    .headers({ "Accept-Encoding" => "gzip, deflate" })
    .get("https://www.sephora.com/")
end

UPDATE: Somehow now I can make it to work without any changes... Just using persistent HTTP:

HTTP.persistent("https://www.sephora.com") do |http|
  puts http.get("https://www.sephora.com/")
end
ixti commented 2 years ago

This is definitely an issue with sephora's backend server. They are using istio-envoi that seems like doing lots of weird stuff. At first, I've been able to make it work with ensuring that we send Connection: keep-alive (lowercase was important) and Accept-Encoding headers, now it works without any patches and without any headers...

ixti commented 2 years ago

LOL. Here's some more details. It seems like they are doing user-agent based rollout. So if you pass user-agent as Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/99.0.4844.84 Safari/537.36 then you also must ensure that Connection: keep-alive (case sensitive) and Accept and Accept-Encoding headers are present:

module HTTP
  class Connection
    KEEP_ALIVE = "keep-alive"
  end
end

HTTP.persistent("https://www.sephora.com") do |http|
  puts http
    .use(:auto_inflate)
    .headers({
      "Accept" => "*/*",
      "Accept-Encoding" => "gzip, deflate",
      "User-Agent" => "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/99.0.4844.84 Safari/537.36"
    })
    .get("https://www.sephora.com/")
end

If you don't pass User-Agent, thus it will be http.rb/5.0.1, then neither accept-encoding nor accept headers are needed:

HTTP.persistent("https://www.sephora.com") do |http|
  puts http.get("https://www.sephora.com/")
end