pypi / support

Issue tracker for support requests related to using https://pypi.org
95 stars 48 forks source link

Requests via squid proxies are being blocked from some regions of Google Cloud #404

Closed markroth8 closed 2 years ago

markroth8 commented 4 years ago

My Platform

We are experiencing timeouts when trying to https_proxy=http://squid-proxy-host:3128/ curl https://pypi.org/ from a client going through a squid proxy at squid-proxy-host. This only affected proxies in some of our Google Cloud regions until yesterday but now all of our proxied requests are timing out.

$ curl --version
curl 7.58.0 (x86_64-pc-linux-gnu) libcurl/7.58.0 OpenSSL/1.1.1d zlib/1.2.11 libidn2/2.0.4 libpsl/0.19.1 (+libidn2/2.0.4) nghttp2/1.30.0 librtmp/2.3
Release-Date: 2018-01-24
Protocols: dict file ftp ftps gopher http https imap imaps ldap ldaps pop3 pop3s rtmp rtsp smb smbs smtp smtps telnet tftp 
Features: AsynchDNS IDN IPv6 Largefile GSS-API Kerberos SPNEGO NTLM NTLM_WB SSL libz TLS-SRP HTTP2 UnixSockets HTTPS-proxy PSL 

squid: 4.10

Our squid is presently configured as a MITM proxy but the same behavior seems to happen with our without the MITM.

Network telemetry

Setup:

Client (in Google Cloud) --> Squid proxy (in Google Cloud) --> https://pypi.org/

What is interesting is that a direct curl https://pypi.org/ coming from the squid host works, but the client request fails with a timeout. Hitting other websites works fine - it is only pypi.org that is failing for us.

DNS Resolution

(From the proxy host)

dig pypi.org A

; <<>> DiG 9.11.3-1ubuntu1.12-Ubuntu <<>> pypi.org A
;; global options: +cmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 28995
;; flags: qr rd ra; QUERY: 1, ANSWER: 4, AUTHORITY: 0, ADDITIONAL: 1

;; OPT PSEUDOSECTION:
; EDNS: version: 0, flags:; udp: 512
;; QUESTION SECTION:
;pypi.org.          IN  A

;; ANSWER SECTION:
pypi.org.       21599   IN  A   151.101.0.223
pypi.org.       21599   IN  A   151.101.192.223
pypi.org.       21599   IN  A   151.101.64.223
pypi.org.       21599   IN  A   151.101.128.223

;; Query time: 2 msec
;; SERVER: 169.254.169.254#53(169.254.169.254)
;; WHEN: Fri May 22 22:28:49 UTC 2020
;; MSG SIZE  rcvd: 101
dig pypi.org AAAA
<Replace with your output>

Traceroutes

IPv4

traceroute pypi.org
traceroute to pypi.org (151.101.0.223), 30 hops max, 60 byte packets
 1  172.17.0.1 (172.17.0.1)  0.022 ms  0.006 ms  0.006 ms
 2  * * *
 3  * * *
 4  * * *
 5  * * *
 6  * * *
 7  * * *
 8  * * *
 9  * * *
10  * * *
11  * * *
12  * * *
13  * * *
14  * * *
15  * * *
16  * * *
17  * * *
18  * * *
19  * * *
20  * * *
21  * * *
22  * * *
23  * * *
24  * * *
25  * * *
26  * * *
27  * * *
28  * * *
29  * * *
30  * * *

HTTPS Requests

IPv4

(from the client)

Some proxies are not blocked and work fine. The same exact proxy deployed in a different Google Cloud region causes a timeout.

$ https_proxy=<bad-proxy> curl https://pypi.org/pypi/pip/json
<times out>

$ https_proxy=<good-proxy> curl https://pypi.org/pypi/pip/json
<works>

(from both good and bad proxy servers, the connection works:)

curl -vvv -I --ipv4 https://pypi.org/pypi/pip/json

# curl -vvv -I --ipv4 https://pypi.org/pypi/pip/json
*   Trying 151.101.128.223...
* TCP_NODELAY set
* Connected to pypi.org (151.101.128.223) port 443 (#0)
* ALPN, offering h2
* ALPN, offering http/1.1
* successfully set certificate verify locations:
*   CAfile: /etc/ssl/certs/ca-certificates.crt
  CApath: /etc/ssl/certs
* TLSv1.3 (OUT), TLS handshake, Client hello (1):
* TLSv1.3 (IN), TLS handshake, Server hello (2):
* TLSv1.3 (IN), TLS Unknown, Certificate Status (22):
* TLSv1.3 (IN), TLS handshake, Unknown (8):
* TLSv1.3 (IN), TLS Unknown, Certificate Status (22):
* TLSv1.3 (IN), TLS handshake, Certificate (11):
* TLSv1.3 (IN), TLS Unknown, Certificate Status (22):
* TLSv1.3 (IN), TLS handshake, CERT verify (15):
* TLSv1.3 (IN), TLS Unknown, Certificate Status (22):
* TLSv1.3 (IN), TLS handshake, Finished (20):
* TLSv1.3 (OUT), TLS change cipher, Client hello (1):
* TLSv1.3 (OUT), TLS Unknown, Certificate Status (22):
* TLSv1.3 (OUT), TLS handshake, Finished (20):
* SSL connection using TLSv1.3 / TLS_AES_256_GCM_SHA384
* ALPN, server accepted to use h2
* Server certificate:
*  subject: businessCategory=Private Organization; jurisdictionC=US; jurisdictionST=Delaware; serialNumber=3359300; C=US; ST=New Hampshire; L=Wolfeboro; O=Python Software Foundation; CN=www.python.org
*  start date: Sep 18 00:00:00 2018 GMT
*  expire date: Oct 14 12:00:00 2020 GMT
*  subjectAltName: host "pypi.org" matched cert's "pypi.org"
*  issuer: C=US; O=DigiCert Inc; OU=www.digicert.com; CN=DigiCert SHA2 Extended Validation Server CA
*  SSL certificate verify ok.
* Using HTTP2, server supports multi-use
* Connection state changed (HTTP/2 confirmed)
* Copying HTTP/2 data in stream buffer to connection buffer after upgrade: len=0
* TLSv1.3 (OUT), TLS Unknown, Unknown (23):
* TLSv1.3 (OUT), TLS Unknown, Unknown (23):
* TLSv1.3 (OUT), TLS Unknown, Unknown (23):
* Using Stream ID: 1 (easy handle 0x563eeca6c580)
* TLSv1.3 (OUT), TLS Unknown, Unknown (23):
> HEAD /pypi/pip/json HTTP/2
> Host: pypi.org
> User-Agent: curl/7.58.0
> Accept: */*
> 
* TLSv1.3 (IN), TLS Unknown, Certificate Status (22):
* TLSv1.3 (IN), TLS handshake, Newsession Ticket (4):
* TLSv1.3 (IN), TLS Unknown, Unknown (23):
* Connection state changed (MAX_CONCURRENT_STREAMS updated)!
* TLSv1.3 (OUT), TLS Unknown, Unknown (23):
* TLSv1.3 (IN), TLS Unknown, Unknown (23):
< HTTP/2 200 
HTTP/2 200 
< access-control-allow-headers: Content-Type, If-Match, If-Modified-Since, If-None-Match, If-Unmodified-Since
access-control-allow-headers: Content-Type, If-Match, If-Modified-Since, If-None-Match, If-Unmodified-Since
< access-control-allow-methods: GET
access-control-allow-methods: GET
< access-control-allow-origin: *
access-control-allow-origin: *
< access-control-expose-headers: X-PyPI-Last-Serial
access-control-expose-headers: X-PyPI-Last-Serial
< access-control-max-age: 86400
access-control-max-age: 86400
< cache-control: max-age=900, public
cache-control: max-age=900, public
< content-security-policy: base-uri 'self'; block-all-mixed-content; connect-src 'self' https://api.github.com/repos/ *.fastly-insights.com sentry.io https://api.pwnedpasswords.com https://2p66nmmycsj3.statuspage.io; default-src 'none'; font-src 'self' fonts.gstatic.com; form-action 'self'; frame-ancestors 'none'; frame-src 'none'; img-src 'self' https://warehouse-camo.ingress.cmh1.psfhosted.org/ www.google-analytics.com *.fastly-insights.com; script-src 'self' www.googletagmanager.com www.google-analytics.com *.fastly-insights.com https://cdn.ravenjs.com; style-src 'self' fonts.googleapis.com; worker-src *.fastly-insights.com
content-security-policy: base-uri 'self'; block-all-mixed-content; connect-src 'self' https://api.github.com/repos/ *.fastly-insights.com sentry.io https://api.pwnedpasswords.com https://2p66nmmycsj3.statuspage.io; default-src 'none'; font-src 'self' fonts.gstatic.com; form-action 'self'; frame-ancestors 'none'; frame-src 'none'; img-src 'self' https://warehouse-camo.ingress.cmh1.psfhosted.org/ www.google-analytics.com *.fastly-insights.com; script-src 'self' www.googletagmanager.com www.google-analytics.com *.fastly-insights.com https://cdn.ravenjs.com; style-src 'self' fonts.googleapis.com; worker-src *.fastly-insights.com
< content-type: application/json
content-type: application/json
< etag: "1bScw84sINFW/pIshXUiKQ"
etag: "1bScw84sINFW/pIshXUiKQ"
< referrer-policy: origin-when-cross-origin
referrer-policy: origin-when-cross-origin
< server: nginx/1.13.9
server: nginx/1.13.9
< x-pypi-last-serial: 7292768
x-pypi-last-serial: 7292768
< accept-ranges: bytes
accept-ranges: bytes
< date: Fri, 22 May 2020 22:31:57 GMT
date: Fri, 22 May 2020 22:31:57 GMT
< x-served-by: cache-bwi5125-BWI, cache-wdc5569-WDC
x-served-by: cache-bwi5125-BWI, cache-wdc5569-WDC
< x-cache: HIT, HIT
x-cache: HIT, HIT
< x-cache-hits: 1, 1
x-cache-hits: 1, 1
< x-timer: S1590186717.113120,VS0,VE1
x-timer: S1590186717.113120,VS0,VE1
< vary: Accept-Encoding, Accept-Encoding
vary: Accept-Encoding, Accept-Encoding
< strict-transport-security: max-age=31536000; includeSubDomains; preload
strict-transport-security: max-age=31536000; includeSubDomains; preload
< x-frame-options: deny
x-frame-options: deny
< x-xss-protection: 1; mode=block
x-xss-protection: 1; mode=block
< x-content-type-options: nosniff
x-content-type-options: nosniff
< x-permitted-cross-domain-policies: none
x-permitted-cross-domain-policies: none
< content-length: 99851
content-length: 99851

< 
* Connection #0 to host pypi.org left intact

TLS Debug

IPv4

(from proxy server)

echo -n | openssl s_client -4 -connect pypi.org:443
# echo -n | openssl s_client -4 -connect pypi.org:443
CONNECTED(00000005)
depth=2 C = US, O = DigiCert Inc, OU = www.digicert.com, CN = DigiCert High Assurance EV Root CA
verify return:1
depth=1 C = US, O = DigiCert Inc, OU = www.digicert.com, CN = DigiCert SHA2 Extended Validation Server CA
verify return:1
depth=0 businessCategory = Private Organization, jurisdictionC = US, jurisdictionST = Delaware, serialNumber = 3359300, C = US, ST = New Hampshire, L = Wolfeboro, O = Python Software Foundation, CN = www.python.org
verify return:1
---
Certificate chain
 0 s:businessCategory = Private Organization, jurisdictionC = US, jurisdictionST = Delaware, serialNumber = 3359300, C = US, ST = New Hampshire, L = Wolfeboro, O = Python Software Foundation, CN = www.python.org
   i:C = US, O = DigiCert Inc, OU = www.digicert.com, CN = DigiCert SHA2 Extended Validation Server CA
 1 s:C = US, O = DigiCert Inc, OU = www.digicert.com, CN = DigiCert SHA2 Extended Validation Server CA
   i:C = US, O = DigiCert Inc, OU = www.digicert.com, CN = DigiCert High Assurance EV Root CA
---
Server certificate
-----BEGIN CERTIFICATE-----
MIIIczCCB1ugAwIBAgIQDl7PGBeDAG2brEU2EfVJEjANBgkqhkiG9w0BAQsFADB1
MQswCQYDVQQGEwJVUzEVMBMGA1UEChMMRGlnaUNlcnQgSW5jMRkwFwYDVQQLExB3
d3cuZGlnaWNlcnQuY29tMTQwMgYDVQQDEytEaWdpQ2VydCBTSEEyIEV4dGVuZGVk
IFZhbGlkYXRpb24gU2VydmVyIENBMB4XDTE4MDkxODAwMDAwMFoXDTIwMTAxNDEy
MDAwMFowgdgxHTAbBgNVBA8MFFByaXZhdGUgT3JnYW5pemF0aW9uMRMwEQYLKwYB
BAGCNzwCAQMTAlVTMRkwFwYLKwYBBAGCNzwCAQITCERlbGF3YXJlMRAwDgYDVQQF
EwczMzU5MzAwMQswCQYDVQQGEwJVUzEWMBQGA1UECBMNTmV3IEhhbXBzaGlyZTES
MBAGA1UEBxMJV29sZmVib3JvMSMwIQYDVQQKExpQeXRob24gU29mdHdhcmUgRm91
bmRhdGlvbjEXMBUGA1UEAxMOd3d3LnB5dGhvbi5vcmcwggEiMA0GCSqGSIb3DQEB
AQUAA4IBDwAwggEKAoIBAQD2roGTDc+m3g3mReIf7j5rzLovnk3Zggvy2kWWMnDB
Yz2m6sgxOL1pkzIu7YNe4noH7ZRym0OIQjZbDIleB7SOGAoD2BBv+Mv79HtedJ3d
vhXwmgRenlNxBxhDcaVLDONVC9Ir3Ft46uuGIrXnKB8L1KZV6h8IFC2G3GZXJtkw
EojXdmJFzuRrKMRi0cLPlF60upRISIj3jg9kK4D+f0xYrsQAJvK0veqRsNQ8bXPE
YR8kG4Paj7rOeh3FVGxrKOKNpoI+kV6EqOVIhJY698P7EbDLGjG2Im9P6VFmwXod
YAZ2CTFaMlrC0FljAO/FKeQKR29vitthLMutcd+AkoDDAgMBAAGjggSZMIIElTAf
BgNVHSMEGDAWgBQ901Cl1qCt7vNKYApl0yHU+PjWDzAdBgNVHQ4EFgQUUTsyHAXZ
y6d2HWn+8MZNUhBCkEwwggFCBgNVHREEggE5MIIBNYIOd3d3LnB5dGhvbi5vcmeC
D2RvY3MucHl0aG9uLm9yZ4IPYnVncy5weXRob24ub3Jngg93aWtpLnB5dGhvbi5v
cmeCDWhnLnB5dGhvbi5vcmeCD21haWwucHl0aG9uLm9yZ4IPcHlwaS5weXRob24u
b3JnghRwYWNrYWdpbmcucHl0aG9uLm9yZ4IQbG9naW4ucHl0aG9uLm9yZ4ISZGlz
Y3Vzcy5weXRob24ub3Jnggx1cy5weWNvbi5vcmeCB3B5cGkuaW+CDGRvY3MucHlw
aS5pb4IIcHlwaS5vcmeCDWRvY3MucHlwaS5vcmeCD2RvbmF0ZS5weXBpLm9yZ4IT
ZGV2Z3VpZGUucHl0aG9uLm9yZ4ITd3d3LmJ1Z3MucHl0aG9uLm9yZ4IKcHl0aG9u
Lm9yZzAOBgNVHQ8BAf8EBAMCBaAwHQYDVR0lBBYwFAYIKwYBBQUHAwEGCCsGAQUF
BwMCMHUGA1UdHwRuMGwwNKAyoDCGLmh0dHA6Ly9jcmwzLmRpZ2ljZXJ0LmNvbS9z
aGEyLWV2LXNlcnZlci1nMi5jcmwwNKAyoDCGLmh0dHA6Ly9jcmw0LmRpZ2ljZXJ0
LmNvbS9zaGEyLWV2LXNlcnZlci1nMi5jcmwwSwYDVR0gBEQwQjA3BglghkgBhv1s
AgEwKjAoBggrBgEFBQcCARYcaHR0cHM6Ly93d3cuZGlnaWNlcnQuY29tL0NQUzAH
BgVngQwBATCBiAYIKwYBBQUHAQEEfDB6MCQGCCsGAQUFBzABhhhodHRwOi8vb2Nz
cC5kaWdpY2VydC5jb20wUgYIKwYBBQUHMAKGRmh0dHA6Ly9jYWNlcnRzLmRpZ2lj
ZXJ0LmNvbS9EaWdpQ2VydFNIQTJFeHRlbmRlZFZhbGlkYXRpb25TZXJ2ZXJDQS5j
cnQwDAYDVR0TAQH/BAIwADCCAX8GCisGAQQB1nkCBAIEggFvBIIBawFpAHcA7ku9
t3XOYLrhQmkfq+GeZqMPfl+wctiDAMR7iXqo/csAAAFl7l3MWwAABAMASDBGAiEA
qfAJSfOHG4r8YvzTkZsr8cEXFcnFIns40+JXVdgY0vsCIQCvnB+YExtMRQVQXONc
glOTsIYmNgYPrtyHYsTj/Xua5QB3AFYUBpov18Ls0/XhvUSyPsdGdrm8mRFcwO+U
mFXWidDdAAABZe5dzHoAAAQDAEgwRgIhAIF6xXakKVdREciK8aM2z1c71eiU8qF/
UCZbl4sEfLTQAiEA870pR9Hazod3FgZszr9itk8sYPLoQjV9/WENl0HXXGwAdQC7
2d+8H4pxtZOUI5eqkntHOFeVCqtS6BqQlmQ2jh7RhQAAAWXuXcwzAAAEAwBGMEQC
IAeIvcaqPATJrCo3+ceBlVbsyPJcDPM7QGLpaRPFBduQAiBHNfcsBLaelS9bUmfU
R0VjJLoTA39AJAj8oU6iAzruHDANBgkqhkiG9w0BAQsFAAOCAQEAwH5+PXougfo5
qMVIE65Dei/CEb4ahZfoMvvlJRvuNAI/ARWYQSHh6IKXdxFxIE5hCC/NNtdAcc1p
CoZ2+IM/x0cGBjHZegSjhXfQy+w7twfgyeTSNalV2jzKQ0Yv/JvfI9qiMdhEQbfL
qaQ6Nj/js8uvQqWf6w8yo7hAzKs1jTF7Wy/cM0lvqNocDWYROhAVcI+jSMKwlcLv
75xAbOZqw2D483mkQizVj7wQ1fYP3Tr0wfvFNeoIAPUXLrEHiEmHWA0h+U9KEgVf
VQ3PvzItecKWFI+0d9RpNupc1HdipBCySvnNa1LcG2AviYXpIsbo8EBvbj0HcEVx
7NEANHNX2g==
-----END CERTIFICATE-----
subject=businessCategory = Private Organization, jurisdictionC = US, jurisdictionST = Delaware, serialNumber = 3359300, C = US, ST = New Hampshire, L = Wolfeboro, O = Python Software Foundation, CN = www.python.org

issuer=C = US, O = DigiCert Inc, OU = www.digicert.com, CN = DigiCert SHA2 Extended Validation Server CA

---
No client certificate CA names sent
Peer signing digest: SHA512
Peer signature type: RSA
Server Temp Key: X25519, 253 bits
---
SSL handshake has read 4025 bytes and written 403 bytes
Verification: OK
---
New, TLSv1.2, Cipher is ECDHE-RSA-AES128-GCM-SHA256
Server public key is 2048 bit
Secure Renegotiation IS supported
Compression: NONE
Expansion: NONE
No ALPN negotiated
SSL-Session:
    Protocol  : TLSv1.2
    Cipher    : ECDHE-RSA-AES128-GCM-SHA256
    Session-ID: 901EF04A764DB327FC92FF4052DC2263D4885A72AFFA1A3A2F70C8727BE27159
    Session-ID-ctx: 
    Master-Key: 095D84DBFFEB177D9F2C3F242C18E23673D0B2EFCDEC86FFE4901486CA3B1591779ABAF4212ABE8C7E7701366995041F
    PSK identity: None
    PSK identity hint: None
    SRP username: None
    TLS session ticket lifetime hint: 7200 (seconds)
    TLS session ticket:
    0000 - 3d 80 0e 15 4b ba 35 69-20 cc 62 05 bb 58 44 82   =...K.5i .b..XD.
    0010 - 87 3c 75 86 43 ff 0d b2-53 39 ab 38 4a 64 a5 ae   .<u.C...S9.8Jd..
    0020 - 89 c3 e9 24 cf 79 85 c4-05 92 8c 3b ee a0 47 d8   ...$.y.....;..G.
    0030 - 92 b4 1e a3 59 1f c3 1b-38 95 8d 20 8b 5c 41 8a   ....Y...8.. .\A.
    0040 - af d2 9c 79 38 95 bd cd-88 eb e1 08 d6 87 11 a7   ...y8...........
    0050 - cd 5c ed 7c f3 53 84 2b-e4 14 f3 fd 34 2b 7d ef   .\.|.S.+....4+}.
    0060 - 05 f5 f2 4c 1b 67 6d 21-d4 07 d3 49 da 8b d6 d0   ...L.gm!...I....
    0070 - ec e2 10 9b 7c 4f 85 2a-4b 51 69 b6 89 f3 f7 a0   ....|O.*KQi.....
    0080 - dd e7 71 d8 8a 66 60 13-9f f9 f1 d2 d8 d3 af fb   ..q..f`.........
    0090 - a9 ea 3c 90 43 bf 85 46-3b 11 85 e4 c5 c3 4f 58   ..<.C..F;.....OX
    00a0 - 48 8c 34 e5 aa 65 50 17-95 95 65 f0 f0 2b 02 58   H.4..eP...e..+.X

    Start Time: 1590187138
    Timeout   : 7200 (sec)
    Verify return code: 0 (ok)
    Extended master secret: yes
---
DONE
markroth8 commented 4 years ago

Still happening today. I was able to get some more debug information and this gets stranger still...

Two requests, to curl https://pypi.org/, one from the client (proxied through squid) and one straight from the squid proxy machine. The first one hangs and the second one works.

I've captured HTTP headers via endpoints.dev. HTTP headers from the first:

 - {"cf-ipcountry":"US"}
   {"host":"<redacted>.endpoints.dev"}
   {"cf-connecting-ip":"35.203.57.152"}
   {"connection":"Keep-Alive"}
   {"x-forwarded-for":"35.203.57.152, 108.162.219.208"}
   {"cache-control":"max-age=259200"}
   {"accept-encoding":"gzip"}
   {"accept":"*/*"}
   {"user-agent":"curl/7.65.0"}

HTTP headers from the second:

 - {"cf-ipcountry":"US"}
   {"host":"<redacted>.endpoints.dev"}
   {"cf-connecting-ip":"35.203.57.152"}
   {"connection":"Keep-Alive"}
   {"x-forwarded-for":"35.203.57.152, 173.245.52.163"}
   {"accept-encoding":"gzip"}
   {"accept":"*/*"}
   {"user-agent":"curl/7.65.0"}

So, they're almost identical but one succeeds and the other does not.

I could use some help debugging this further.

markroth8 commented 4 years ago

Another data point: http is not blocked from the client machine through the proxy - I get a 301 to https://pypi.org/ when I request http://pypi.org/ via curl. But https hangs.

alex commented 4 years ago

If I were the one debugging this, my next move would be to pcap squid or strace it, and figure out where in the process it's hanging -- since, as I understand this, you've narrowed it down to something inside squid itself, curl on the same box works.

markroth8 commented 4 years ago

Thanks, @alex. I should be more careful with my description - the squid process itself is not hanging - all other requests going through squid continue to work properly. Squid is timing out waiting for a successful response from pypi.org.

alex commented 4 years ago

Ok, yes. I guess my question is "where in the request/response" cycle is it hanging: opening a TCP connection? TLS handshake? waiting for an HTTP response? reading HTTP response body?

On Tue, May 26, 2020 at 12:10 PM Mark Roth notifications@github.com wrote:

Thanks, @alex https://github.com/alex. I should be more careful with my description - the squid process itself is not hanging - all other requests going through squid continue to work properly. Squid is timing out waiting for a successful response from pypi.org.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/pypa/pypi-support/issues/404#issuecomment-634123234, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAAAGBFEUTSA65OYH4OXB5DRTPSXZANCNFSM4NIFFH5Q .

-- All that is necessary for evil to succeed is for good people to do nothing.

markroth8 commented 4 years ago

Thanks much for your time helping us debug this.

Good suggestion to get more logs. Last week, we were able to turn on increased squid logging and captured the following log. The following two lines lead us to believe there's an issue negotiating TLS. Note that the proxy works for all other sites, and it was working for pypi.org until a few of our proxies started exhibiting this behavior last week. Today, all of our squid proxies have this issue.

Suspicious lines:

2020/05/21 19:27:28.580| 83,5| Handshake.cc(405) parseExtensions: first unsupported extension: 43
2020/05/21 19:27:28.580| 24,5| BinaryTokenizer.cc(47) want: 1 more bytes for TLSPlaintext.type occupying 1 bytes @4143 in 0x1b94840;

Are there any logs on pypi.org's side that could be helpful in debugging this issue?

Here is the full log:

2020/05/21 19:27:28.567| 83,5| PeerConnector.cc(88) prepareSocket: local=172.17.0.2:57012 remote=151.101.0.223:443 FD 14 flags=1, this=0x1d3b788
2020/05/21 19:27:28.567| 83,5| PeerConnector.cc(94) prepareSocket: local=172.17.0.2:57012 remote=151.101.0.223:443 FD 14 flags=1
2020/05/21 19:27:28.567| 9,5| AsyncCall.cc(26) AsyncCall: The AsyncCall Security::PeerConnector::commCloseHandler constructed, this=0x1d2c990 [call2471]
2020/05/21 19:27:28.567| 5,5| comm.cc(985) comm_add_close_handler: comm_add_close_handler: FD 14, AsyncCall=0x1d2c990*1
2020/05/21 19:27:28.567| 83,5| PeerConnector.cc(107) initialize: local=172.17.0.2:57012 remote=151.101.0.223:443 FD 14 flags=1, ctx=0x176ea70
2020/05/21 19:27:28.567| 83,5| Session.cc(103) NewSessionObject: SSL_new session=0x1d37d50
2020/05/21 19:27:28.567| 83,5| bio.cc(612) squid_bio_ctrl: 0x1d1a380 104(6001, 0x7ffe0c004ee4)
2020/05/21 19:27:28.567| 83,5| Session.cc(161) CreateSession: link FD 14 to TLS session=0x1d37d50
2020/05/21 19:27:28.567| 83,5| PeerConnector.cc(123) initialize: local=172.17.0.2:57012 remote=151.101.0.223:443 FD 14 flags=1, session=0x1d37d50
2020/05/21 19:27:28.567| 14,3| Address.cc(382) lookupHostIP: Given Non-IP 'pypi.org': Name or service not known
2020/05/21 19:27:28.567| 83,5| BlindPeerConnector.cc(60) initialize: success
2020/05/21 19:27:28.567| 83,5| PeerConnector.cc(188) negotiate: SSL_connect session=0x1d37d50
2020/05/21 19:27:28.567| 83,5| bio.cc(612) squid_bio_ctrl: 0x1d1a380 6(0, 0x1d16bf0)
2020/05/21 19:27:28.568| 83,5| bio.cc(113) write: FD 14 wrote 314 <= 314
2020/05/21 19:27:28.568| 83,5| bio.cc(612) squid_bio_ctrl: 0x1d1a380 11(0, 0)
2020/05/21 19:27:28.568| 83,5| bio.cc(136) read: FD 14 read -1 <= 65535
2020/05/21 19:27:28.568| 83,5| bio.cc(141) read: error: 11 ignored: 1
2020/05/21 19:27:28.568| 83,5| PeerConnector.cc(462) noteWantRead: local=172.17.0.2:57012 remote=151.101.0.223:443 FD 14 flags=1
2020/05/21 19:27:28.568| 5,3| comm.cc(559) commSetConnTimeout: local=172.17.0.2:57012 remote=151.101.0.223:443 FD 14 flags=1 timeout 60
2020/05/21 19:27:28.568| 5,5| ModEpoll.cc(117) SetSelect: FD 14, type=1, handler=1, client_data=0x1d2ca60, timeout=0
2020/05/21 19:27:28.568| 93,5| AsyncJob.cc(154) callEnd: Security::BlindPeerConnector status out: [ FD 14 job74]
2020/05/21 19:27:28.568| 93,5| AsyncCallQueue.cc(57) fireNext: leaving AsyncJob::start()
2020/05/21 19:27:28.580| 83,5| PeerConnector.cc(188) negotiate: SSL_connect session=0x1d37d50
2020/05/21 19:27:28.580| 83,5| bio.cc(136) read: FD 14 read 4143 <= 65535
2020/05/21 19:27:28.580| 83,5| Handshake.cc(405) parseExtensions: first unsupported extension: 43
2020/05/21 19:27:28.580| 24,5| BinaryTokenizer.cc(47) want: 1 more bytes for TLSPlaintext.type occupying 1 bytes @4143 in 0x1b94840;
2020/05/21 19:27:28.580| 83,5| Handshake.cc(533) parseHello: need more data
2020/05/21 19:27:28.580| 83,5| PeerConnector.cc(462) noteWantRead: local=172.17.0.2:57012 remote=151.101.0.223:443 FD 14 flags=1
2020/05/21 19:27:28.580| 5,3| comm.cc(559) commSetConnTimeout: local=172.17.0.2:57012 remote=151.101.0.223:443 FD 14 flags=1 timeout 60
2020/05/21 19:27:28.580| 5,5| ModEpoll.cc(117) SetSelect: FD 14, type=1, handler=1, client_data=0x1d18730, timeout=0
2020/05/21 19:27:29.129| 41,5| AsyncCall.cc(26) AsyncCall: The AsyncCall logfileFlush constructed, this=0x1d018e0 [call2473]
2020/05/21 19:27:29.129| 41,5| AsyncCall.cc(93) ScheduleCall: event.cc(241) will call logfileFlush(0x1895498*?) [call2473]
2020/05/21 19:27:29.129| 41,5| AsyncCallQueue.cc(55) fireNext: entering logfileFlush(0x1895498*?)
2020/05/21 19:27:29.129| 41,5| AsyncCall.cc(38) make: make call logfileFlush [call2473]
2020/05/21 19:27:29.129| 41,5| AsyncCallQueue.cc(57) fireNext: leaving logfileFlush(0x1895498*?)
2020/05/21 19:27:29.129| 41,5| AsyncCall.cc(26) AsyncCall: The AsyncCall MaintainSwapSpace constructed, this=0x1d018e0 [call2474]
2020/05/21 19:27:29.129| 41,5| AsyncCall.cc(93) ScheduleCall: event.cc(241) will call MaintainSwapSpace() [call2474]
2020/05/21 19:27:29.129| 41,5| AsyncCallQueue.cc(55) fireNext: entering MaintainSwapSpace()
2020/05/21 19:27:29.129| 41,5| AsyncCall.cc(38) make: make call MaintainSwapSpace [call2474]
2020/05/21 19:27:29.129| 41,5| AsyncCallQueue.cc(57) fireNext: leaving MaintainSwapSpace()
2020/05/21 19:27:29.651| 41,5| AsyncCall.cc(26) AsyncCall: The AsyncCall memPoolCleanIdlePools constructed, this=0x1d018e0 [call2475]
alex commented 4 years ago

Ok, so based on my read of this (I don't know anything about squid, but I know a lot about OpenSSL... because life decisions):

Seems like there's two possibilities here: a) The server is never sending that byte b) Something is loosing it after the kernel acks it at the TCP leel

The server in this case is actually the fastly CDN, so our ability to instrument that side ourselves is a bit limited. I think my next move might be to try to get a pcap, if that's possible?

On Tue, May 26, 2020 at 6:19 PM Mark Roth notifications@github.com wrote:

Thanks much for your time helping us debug this.

Good suggestion to get more logs. Last week, we were able to turn on increased squid logging and captured the following log. The following two lines lead us to believe there's an issue negotiating TLS. Note that the proxy works for all other sites, and it was working for pypi.org until a few of our proxies started exhibiting this behavior last week. Today, all of our squid proxies have this issue.

Suspicious lines:

2020/05/21 19:27:28.580| 83,5| Handshake.cc(405) parseExtensions: first unsupported extension: 43 2020/05/21 19:27:28.580| 24,5| BinaryTokenizer.cc(47) want: 1 more bytes for TLSPlaintext.type occupying 1 bytes @4143 in 0x1b94840;

Are there any logs on pypi.org's side that could be helpful in debugging this issue?

Here is the full log:

2020/05/21 19:27:28.567| 83,5| PeerConnector.cc(88) prepareSocket: local=172.17.0.2:57012 remote=151.101.0.223:443 FD 14 flags=1, this=0x1d3b788 2020/05/21 19:27:28.567| 83,5| PeerConnector.cc(94) prepareSocket: local=172.17.0.2:57012 remote=151.101.0.223:443 FD 14 flags=1 2020/05/21 19:27:28.567| 9,5| AsyncCall.cc(26) AsyncCall: The AsyncCall Security::PeerConnector::commCloseHandler constructed, this=0x1d2c990 [call2471] 2020/05/21 19:27:28.567| 5,5| comm.cc(985) comm_add_close_handler: comm_add_close_handler: FD 14, AsyncCall=0x1d2c9901 2020/05/21 19:27:28.567| 83,5| PeerConnector.cc(107) initialize: local=172.17.0.2:57012 remote=151.101.0.223:443 FD 14 flags=1, ctx=0x176ea70 2020/05/21 19:27:28.567| 83,5| Session.cc(103) NewSessionObject: SSL_new session=0x1d37d50 2020/05/21 19:27:28.567| 83,5| bio.cc(612) squid_bio_ctrl: 0x1d1a380 104(6001, 0x7ffe0c004ee4) 2020/05/21 19:27:28.567| 83,5| Session.cc(161) CreateSession: link FD 14 to TLS session=0x1d37d50 2020/05/21 19:27:28.567| 83,5| PeerConnector.cc(123) initialize: local=172.17.0.2:57012 remote=151.101.0.223:443 FD 14 flags=1, session=0x1d37d50 2020/05/21 19:27:28.567| 14,3| Address.cc(382) lookupHostIP: Given Non-IP 'pypi.org': Name or service not known 2020/05/21 19:27:28.567| 83,5| BlindPeerConnector.cc(60) initialize: success 2020/05/21 19:27:28.567| 83,5| PeerConnector.cc(188) negotiate: SSL_connect session=0x1d37d50 2020/05/21 19:27:28.567| 83,5| bio.cc(612) squid_bio_ctrl: 0x1d1a380 6(0, 0x1d16bf0) 2020/05/21 19:27:28.568| 83,5| bio.cc(113) write: FD 14 wrote 314 <= 314 2020/05/21 19:27:28.568| 83,5| bio.cc(612) squid_bio_ctrl: 0x1d1a380 11(0, 0) 2020/05/21 19:27:28.568| 83,5| bio.cc(136) read: FD 14 read -1 <= 65535 2020/05/21 19:27:28.568| 83,5| bio.cc(141) read: error: 11 ignored: 1 2020/05/21 19:27:28.568| 83,5| PeerConnector.cc(462) noteWantRead: local=172.17.0.2:57012 remote=151.101.0.223:443 FD 14 flags=1 2020/05/21 19:27:28.568| 5,3| comm.cc(559) commSetConnTimeout: local=172.17.0.2:57012 remote=151.101.0.223:443 FD 14 flags=1 timeout 60 2020/05/21 19:27:28.568| 5,5| ModEpoll.cc(117) SetSelect: FD 14, type=1, handler=1, client_data=0x1d2ca60, timeout=0 2020/05/21 19:27:28.568| 93,5| AsyncJob.cc(154) callEnd: Security::BlindPeerConnector status out: [ FD 14 job74] 2020/05/21 19:27:28.568| 93,5| AsyncCallQueue.cc(57) fireNext: leaving AsyncJob::start() 2020/05/21 19:27:28.580| 83,5| PeerConnector.cc(188) negotiate: SSL_connect session=0x1d37d50 2020/05/21 19:27:28.580| 83,5| bio.cc(136) read: FD 14 read 4143 <= 65535 2020/05/21 19:27:28.580| 83,5| Handshake.cc(405) parseExtensions: first unsupported extension: 43 2020/05/21 19:27:28.580| 24,5| BinaryTokenizer.cc(47) want: 1 more bytes for TLSPlaintext.type occupying 1 bytes @4143 in 0x1b94840; 2020/05/21 19:27:28.580| 83,5| Handshake.cc(533) parseHello: need more data 2020/05/21 19:27:28.580| 83,5| PeerConnector.cc(462) noteWantRead: local=172.17.0.2:57012 remote=151.101.0.223:443 FD 14 flags=1 2020/05/21 19:27:28.580| 5,3| comm.cc(559) commSetConnTimeout: local=172.17.0.2:57012 remote=151.101.0.223:443 FD 14 flags=1 timeout 60 2020/05/21 19:27:28.580| 5,5| ModEpoll.cc(117) SetSelect: FD 14, type=1, handler=1, client_data=0x1d18730, timeout=0 2020/05/21 19:27:29.129| 41,5| AsyncCall.cc(26) AsyncCall: The AsyncCall logfileFlush constructed, this=0x1d018e0 [call2473] 2020/05/21 19:27:29.129| 41,5| AsyncCall.cc(93) ScheduleCall: event.cc(241) will call logfileFlush(0x1895498?) [call2473] 2020/05/21 19:27:29.129| 41,5| AsyncCallQueue.cc(55) fireNext: entering logfileFlush(0x1895498?) 2020/05/21 19:27:29.129| 41,5| AsyncCall.cc(38) make: make call logfileFlush [call2473] 2020/05/21 19:27:29.129| 41,5| AsyncCallQueue.cc(57) fireNext: leaving logfileFlush(0x1895498?) 2020/05/21 19:27:29.129| 41,5| AsyncCall.cc(26) AsyncCall: The AsyncCall MaintainSwapSpace constructed, this=0x1d018e0 [call2474] 2020/05/21 19:27:29.129| 41,5| AsyncCall.cc(93) ScheduleCall: event.cc(241) will call MaintainSwapSpace() [call2474] 2020/05/21 19:27:29.129| 41,5| AsyncCallQueue.cc(55) fireNext: entering MaintainSwapSpace() 2020/05/21 19:27:29.129| 41,5| AsyncCall.cc(38) make: make call MaintainSwapSpace [call2474] 2020/05/21 19:27:29.129| 41,5| AsyncCallQueue.cc(57) fireNext: leaving MaintainSwapSpace() 2020/05/21 19:27:29.651| 41,5| AsyncCall.cc(26) AsyncCall: The AsyncCall memPoolCleanIdlePools constructed, this=0x1d018e0 [call2475]

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/pypa/pypi-support/issues/404#issuecomment-634310580, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAAAGBDHP5F3O46HCNXG6NDRTQ55LANCNFSM4NIFFH5Q .

-- All that is necessary for evil to succeed is for good people to do nothing.

di commented 4 years ago

From the #pypa IRC channel earlier today, seems possibly related:

Hello everyone, we started seeing ssl authentication errors due to this bug: https://bugs.openjdk.java.net/browse/JDK-8213202 , starting on the 14th of May. We are updating our java library version shortly to address this issue. Would anyone know if there was a change to pypi.org on or about 14th of May that could account for this? Thank you.

markroth8 commented 4 years ago

Thanks, @di, for the issue reference. I'm not sure that a race condition explains what we are seeing, as it happens 100% of the time. But it might be a clue as to some changing server-side behavior because things used to work for us.

@alex glad you made the life decisions you made to help us understand this better. Your read of this sounds correct to me - there seems to be an error in the protocol negotiation and it is one byte short of what it is expecting. I don't think a pcap will help because the traffic is encrypted anyway.

It seems unlikely that this is at the TCP level because we see no issues with any other sites. Seems more like a TLS protocol-level issue.

We've been researching the issue from the squid side as well, and it seems like there are a lot of posts about squid mitm issues on the mailing list: http://squid-web-proxy-cache.1019090.n4.nabble.com/Squid-Users-f1019091.html (the post from hanxie is ours).

On our side, we might attempt a different mitm proxy to see if it changes anything. On the pypi side, I'm curious as to the answer to the IRC user's question. Was there a change to pypi.org this month?

di commented 4 years ago

On the pypi side, I'm curious as to the answer to the IRC user's question. Was there a change to pypi.org this month?

No, I'm not aware of anything that we changed which could explain this. I suppose it's possible something changed with our CDN provider (Fastly) but I'm not sure what that would be.

alex commented 4 years ago

We don't have visibility into when fastly deploys stuff to their edge, but I'd be willing to bet they're constantly making changes :-)

On Wed, May 27, 2020 at 4:32 PM Dustin Ingram notifications@github.com wrote:

On the pypi side, I'm curious as to the answer to the IRC user's question. Was there a change to pypi.org this month?

No, I'm not aware of anything that we changed which could explain this. I suppose it's possible something changed with our CDN provider (Fastly) but I'm not sure what that would be.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/pypa/pypi-support/issues/404#issuecomment-634923524, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAAAGBHRSFY33MCPMABB423RTV2GFANCNFSM4NIFFH5Q .

-- All that is necessary for evil to succeed is for good people to do nothing.

pradyunsg commented 2 years ago

Is this still relevant?

di commented 2 years ago

Given that this is quite old and we haven't received similar reports, I'm closing this.