Closed markroth8 closed 2 years ago
Still happening today. I was able to get some more debug information and this gets stranger still...
Two requests, to curl https://pypi.org/
, one from the client (proxied through squid) and one straight from the squid proxy machine. The first one hangs and the second one works.
I've captured HTTP headers via endpoints.dev. HTTP headers from the first:
- {"cf-ipcountry":"US"}
{"host":"<redacted>.endpoints.dev"}
{"cf-connecting-ip":"35.203.57.152"}
{"connection":"Keep-Alive"}
{"x-forwarded-for":"35.203.57.152, 108.162.219.208"}
{"cache-control":"max-age=259200"}
{"accept-encoding":"gzip"}
{"accept":"*/*"}
{"user-agent":"curl/7.65.0"}
HTTP headers from the second:
- {"cf-ipcountry":"US"}
{"host":"<redacted>.endpoints.dev"}
{"cf-connecting-ip":"35.203.57.152"}
{"connection":"Keep-Alive"}
{"x-forwarded-for":"35.203.57.152, 173.245.52.163"}
{"accept-encoding":"gzip"}
{"accept":"*/*"}
{"user-agent":"curl/7.65.0"}
So, they're almost identical but one succeeds and the other does not.
I could use some help debugging this further.
Another data point: http is not blocked from the client machine through the proxy - I get a 301 to https://pypi.org/ when I request http://pypi.org/ via curl. But https hangs.
If I were the one debugging this, my next move would be to pcap squid or strace it, and figure out where in the process it's hanging -- since, as I understand this, you've narrowed it down to something inside squid itself, curl on the same box works.
Thanks, @alex. I should be more careful with my description - the squid process itself is not hanging - all other requests going through squid continue to work properly. Squid is timing out waiting for a successful response from pypi.org.
Ok, yes. I guess my question is "where in the request/response" cycle is it hanging: opening a TCP connection? TLS handshake? waiting for an HTTP response? reading HTTP response body?
On Tue, May 26, 2020 at 12:10 PM Mark Roth notifications@github.com wrote:
Thanks, @alex https://github.com/alex. I should be more careful with my description - the squid process itself is not hanging - all other requests going through squid continue to work properly. Squid is timing out waiting for a successful response from pypi.org.
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/pypa/pypi-support/issues/404#issuecomment-634123234, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAAAGBFEUTSA65OYH4OXB5DRTPSXZANCNFSM4NIFFH5Q .
-- All that is necessary for evil to succeed is for good people to do nothing.
Thanks much for your time helping us debug this.
Good suggestion to get more logs. Last week, we were able to turn on increased squid logging and captured the following log. The following two lines lead us to believe there's an issue negotiating TLS. Note that the proxy works for all other sites, and it was working for pypi.org
until a few of our proxies started exhibiting this behavior last week. Today, all of our squid proxies have this issue.
Suspicious lines:
2020/05/21 19:27:28.580| 83,5| Handshake.cc(405) parseExtensions: first unsupported extension: 43
2020/05/21 19:27:28.580| 24,5| BinaryTokenizer.cc(47) want: 1 more bytes for TLSPlaintext.type occupying 1 bytes @4143 in 0x1b94840;
Are there any logs on pypi.org's side that could be helpful in debugging this issue?
Here is the full log:
2020/05/21 19:27:28.567| 83,5| PeerConnector.cc(88) prepareSocket: local=172.17.0.2:57012 remote=151.101.0.223:443 FD 14 flags=1, this=0x1d3b788
2020/05/21 19:27:28.567| 83,5| PeerConnector.cc(94) prepareSocket: local=172.17.0.2:57012 remote=151.101.0.223:443 FD 14 flags=1
2020/05/21 19:27:28.567| 9,5| AsyncCall.cc(26) AsyncCall: The AsyncCall Security::PeerConnector::commCloseHandler constructed, this=0x1d2c990 [call2471]
2020/05/21 19:27:28.567| 5,5| comm.cc(985) comm_add_close_handler: comm_add_close_handler: FD 14, AsyncCall=0x1d2c990*1
2020/05/21 19:27:28.567| 83,5| PeerConnector.cc(107) initialize: local=172.17.0.2:57012 remote=151.101.0.223:443 FD 14 flags=1, ctx=0x176ea70
2020/05/21 19:27:28.567| 83,5| Session.cc(103) NewSessionObject: SSL_new session=0x1d37d50
2020/05/21 19:27:28.567| 83,5| bio.cc(612) squid_bio_ctrl: 0x1d1a380 104(6001, 0x7ffe0c004ee4)
2020/05/21 19:27:28.567| 83,5| Session.cc(161) CreateSession: link FD 14 to TLS session=0x1d37d50
2020/05/21 19:27:28.567| 83,5| PeerConnector.cc(123) initialize: local=172.17.0.2:57012 remote=151.101.0.223:443 FD 14 flags=1, session=0x1d37d50
2020/05/21 19:27:28.567| 14,3| Address.cc(382) lookupHostIP: Given Non-IP 'pypi.org': Name or service not known
2020/05/21 19:27:28.567| 83,5| BlindPeerConnector.cc(60) initialize: success
2020/05/21 19:27:28.567| 83,5| PeerConnector.cc(188) negotiate: SSL_connect session=0x1d37d50
2020/05/21 19:27:28.567| 83,5| bio.cc(612) squid_bio_ctrl: 0x1d1a380 6(0, 0x1d16bf0)
2020/05/21 19:27:28.568| 83,5| bio.cc(113) write: FD 14 wrote 314 <= 314
2020/05/21 19:27:28.568| 83,5| bio.cc(612) squid_bio_ctrl: 0x1d1a380 11(0, 0)
2020/05/21 19:27:28.568| 83,5| bio.cc(136) read: FD 14 read -1 <= 65535
2020/05/21 19:27:28.568| 83,5| bio.cc(141) read: error: 11 ignored: 1
2020/05/21 19:27:28.568| 83,5| PeerConnector.cc(462) noteWantRead: local=172.17.0.2:57012 remote=151.101.0.223:443 FD 14 flags=1
2020/05/21 19:27:28.568| 5,3| comm.cc(559) commSetConnTimeout: local=172.17.0.2:57012 remote=151.101.0.223:443 FD 14 flags=1 timeout 60
2020/05/21 19:27:28.568| 5,5| ModEpoll.cc(117) SetSelect: FD 14, type=1, handler=1, client_data=0x1d2ca60, timeout=0
2020/05/21 19:27:28.568| 93,5| AsyncJob.cc(154) callEnd: Security::BlindPeerConnector status out: [ FD 14 job74]
2020/05/21 19:27:28.568| 93,5| AsyncCallQueue.cc(57) fireNext: leaving AsyncJob::start()
2020/05/21 19:27:28.580| 83,5| PeerConnector.cc(188) negotiate: SSL_connect session=0x1d37d50
2020/05/21 19:27:28.580| 83,5| bio.cc(136) read: FD 14 read 4143 <= 65535
2020/05/21 19:27:28.580| 83,5| Handshake.cc(405) parseExtensions: first unsupported extension: 43
2020/05/21 19:27:28.580| 24,5| BinaryTokenizer.cc(47) want: 1 more bytes for TLSPlaintext.type occupying 1 bytes @4143 in 0x1b94840;
2020/05/21 19:27:28.580| 83,5| Handshake.cc(533) parseHello: need more data
2020/05/21 19:27:28.580| 83,5| PeerConnector.cc(462) noteWantRead: local=172.17.0.2:57012 remote=151.101.0.223:443 FD 14 flags=1
2020/05/21 19:27:28.580| 5,3| comm.cc(559) commSetConnTimeout: local=172.17.0.2:57012 remote=151.101.0.223:443 FD 14 flags=1 timeout 60
2020/05/21 19:27:28.580| 5,5| ModEpoll.cc(117) SetSelect: FD 14, type=1, handler=1, client_data=0x1d18730, timeout=0
2020/05/21 19:27:29.129| 41,5| AsyncCall.cc(26) AsyncCall: The AsyncCall logfileFlush constructed, this=0x1d018e0 [call2473]
2020/05/21 19:27:29.129| 41,5| AsyncCall.cc(93) ScheduleCall: event.cc(241) will call logfileFlush(0x1895498*?) [call2473]
2020/05/21 19:27:29.129| 41,5| AsyncCallQueue.cc(55) fireNext: entering logfileFlush(0x1895498*?)
2020/05/21 19:27:29.129| 41,5| AsyncCall.cc(38) make: make call logfileFlush [call2473]
2020/05/21 19:27:29.129| 41,5| AsyncCallQueue.cc(57) fireNext: leaving logfileFlush(0x1895498*?)
2020/05/21 19:27:29.129| 41,5| AsyncCall.cc(26) AsyncCall: The AsyncCall MaintainSwapSpace constructed, this=0x1d018e0 [call2474]
2020/05/21 19:27:29.129| 41,5| AsyncCall.cc(93) ScheduleCall: event.cc(241) will call MaintainSwapSpace() [call2474]
2020/05/21 19:27:29.129| 41,5| AsyncCallQueue.cc(55) fireNext: entering MaintainSwapSpace()
2020/05/21 19:27:29.129| 41,5| AsyncCall.cc(38) make: make call MaintainSwapSpace [call2474]
2020/05/21 19:27:29.129| 41,5| AsyncCallQueue.cc(57) fireNext: leaving MaintainSwapSpace()
2020/05/21 19:27:29.651| 41,5| AsyncCall.cc(26) AsyncCall: The AsyncCall memPoolCleanIdlePools constructed, this=0x1d018e0 [call2475]
Ok, so based on my read of this (I don't know anything about squid, but I know a lot about OpenSSL... because life decisions):
Seems like there's two possibilities here: a) The server is never sending that byte b) Something is loosing it after the kernel acks it at the TCP leel
The server in this case is actually the fastly CDN, so our ability to instrument that side ourselves is a bit limited. I think my next move might be to try to get a pcap, if that's possible?
On Tue, May 26, 2020 at 6:19 PM Mark Roth notifications@github.com wrote:
Thanks much for your time helping us debug this.
Good suggestion to get more logs. Last week, we were able to turn on increased squid logging and captured the following log. The following two lines lead us to believe there's an issue negotiating TLS. Note that the proxy works for all other sites, and it was working for pypi.org until a few of our proxies started exhibiting this behavior last week. Today, all of our squid proxies have this issue.
Suspicious lines:
2020/05/21 19:27:28.580| 83,5| Handshake.cc(405) parseExtensions: first unsupported extension: 43 2020/05/21 19:27:28.580| 24,5| BinaryTokenizer.cc(47) want: 1 more bytes for TLSPlaintext.type occupying 1 bytes @4143 in 0x1b94840;
Are there any logs on pypi.org's side that could be helpful in debugging this issue?
Here is the full log:
2020/05/21 19:27:28.567| 83,5| PeerConnector.cc(88) prepareSocket: local=172.17.0.2:57012 remote=151.101.0.223:443 FD 14 flags=1, this=0x1d3b788 2020/05/21 19:27:28.567| 83,5| PeerConnector.cc(94) prepareSocket: local=172.17.0.2:57012 remote=151.101.0.223:443 FD 14 flags=1 2020/05/21 19:27:28.567| 9,5| AsyncCall.cc(26) AsyncCall: The AsyncCall Security::PeerConnector::commCloseHandler constructed, this=0x1d2c990 [call2471] 2020/05/21 19:27:28.567| 5,5| comm.cc(985) comm_add_close_handler: comm_add_close_handler: FD 14, AsyncCall=0x1d2c9901 2020/05/21 19:27:28.567| 83,5| PeerConnector.cc(107) initialize: local=172.17.0.2:57012 remote=151.101.0.223:443 FD 14 flags=1, ctx=0x176ea70 2020/05/21 19:27:28.567| 83,5| Session.cc(103) NewSessionObject: SSL_new session=0x1d37d50 2020/05/21 19:27:28.567| 83,5| bio.cc(612) squid_bio_ctrl: 0x1d1a380 104(6001, 0x7ffe0c004ee4) 2020/05/21 19:27:28.567| 83,5| Session.cc(161) CreateSession: link FD 14 to TLS session=0x1d37d50 2020/05/21 19:27:28.567| 83,5| PeerConnector.cc(123) initialize: local=172.17.0.2:57012 remote=151.101.0.223:443 FD 14 flags=1, session=0x1d37d50 2020/05/21 19:27:28.567| 14,3| Address.cc(382) lookupHostIP: Given Non-IP 'pypi.org': Name or service not known 2020/05/21 19:27:28.567| 83,5| BlindPeerConnector.cc(60) initialize: success 2020/05/21 19:27:28.567| 83,5| PeerConnector.cc(188) negotiate: SSL_connect session=0x1d37d50 2020/05/21 19:27:28.567| 83,5| bio.cc(612) squid_bio_ctrl: 0x1d1a380 6(0, 0x1d16bf0) 2020/05/21 19:27:28.568| 83,5| bio.cc(113) write: FD 14 wrote 314 <= 314 2020/05/21 19:27:28.568| 83,5| bio.cc(612) squid_bio_ctrl: 0x1d1a380 11(0, 0) 2020/05/21 19:27:28.568| 83,5| bio.cc(136) read: FD 14 read -1 <= 65535 2020/05/21 19:27:28.568| 83,5| bio.cc(141) read: error: 11 ignored: 1 2020/05/21 19:27:28.568| 83,5| PeerConnector.cc(462) noteWantRead: local=172.17.0.2:57012 remote=151.101.0.223:443 FD 14 flags=1 2020/05/21 19:27:28.568| 5,3| comm.cc(559) commSetConnTimeout: local=172.17.0.2:57012 remote=151.101.0.223:443 FD 14 flags=1 timeout 60 2020/05/21 19:27:28.568| 5,5| ModEpoll.cc(117) SetSelect: FD 14, type=1, handler=1, client_data=0x1d2ca60, timeout=0 2020/05/21 19:27:28.568| 93,5| AsyncJob.cc(154) callEnd: Security::BlindPeerConnector status out: [ FD 14 job74] 2020/05/21 19:27:28.568| 93,5| AsyncCallQueue.cc(57) fireNext: leaving AsyncJob::start() 2020/05/21 19:27:28.580| 83,5| PeerConnector.cc(188) negotiate: SSL_connect session=0x1d37d50 2020/05/21 19:27:28.580| 83,5| bio.cc(136) read: FD 14 read 4143 <= 65535 2020/05/21 19:27:28.580| 83,5| Handshake.cc(405) parseExtensions: first unsupported extension: 43 2020/05/21 19:27:28.580| 24,5| BinaryTokenizer.cc(47) want: 1 more bytes for TLSPlaintext.type occupying 1 bytes @4143 in 0x1b94840; 2020/05/21 19:27:28.580| 83,5| Handshake.cc(533) parseHello: need more data 2020/05/21 19:27:28.580| 83,5| PeerConnector.cc(462) noteWantRead: local=172.17.0.2:57012 remote=151.101.0.223:443 FD 14 flags=1 2020/05/21 19:27:28.580| 5,3| comm.cc(559) commSetConnTimeout: local=172.17.0.2:57012 remote=151.101.0.223:443 FD 14 flags=1 timeout 60 2020/05/21 19:27:28.580| 5,5| ModEpoll.cc(117) SetSelect: FD 14, type=1, handler=1, client_data=0x1d18730, timeout=0 2020/05/21 19:27:29.129| 41,5| AsyncCall.cc(26) AsyncCall: The AsyncCall logfileFlush constructed, this=0x1d018e0 [call2473] 2020/05/21 19:27:29.129| 41,5| AsyncCall.cc(93) ScheduleCall: event.cc(241) will call logfileFlush(0x1895498?) [call2473] 2020/05/21 19:27:29.129| 41,5| AsyncCallQueue.cc(55) fireNext: entering logfileFlush(0x1895498?) 2020/05/21 19:27:29.129| 41,5| AsyncCall.cc(38) make: make call logfileFlush [call2473] 2020/05/21 19:27:29.129| 41,5| AsyncCallQueue.cc(57) fireNext: leaving logfileFlush(0x1895498?) 2020/05/21 19:27:29.129| 41,5| AsyncCall.cc(26) AsyncCall: The AsyncCall MaintainSwapSpace constructed, this=0x1d018e0 [call2474] 2020/05/21 19:27:29.129| 41,5| AsyncCall.cc(93) ScheduleCall: event.cc(241) will call MaintainSwapSpace() [call2474] 2020/05/21 19:27:29.129| 41,5| AsyncCallQueue.cc(55) fireNext: entering MaintainSwapSpace() 2020/05/21 19:27:29.129| 41,5| AsyncCall.cc(38) make: make call MaintainSwapSpace [call2474] 2020/05/21 19:27:29.129| 41,5| AsyncCallQueue.cc(57) fireNext: leaving MaintainSwapSpace() 2020/05/21 19:27:29.651| 41,5| AsyncCall.cc(26) AsyncCall: The AsyncCall memPoolCleanIdlePools constructed, this=0x1d018e0 [call2475]
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/pypa/pypi-support/issues/404#issuecomment-634310580, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAAAGBDHP5F3O46HCNXG6NDRTQ55LANCNFSM4NIFFH5Q .
-- All that is necessary for evil to succeed is for good people to do nothing.
From the #pypa IRC channel earlier today, seems possibly related:
Hello everyone, we started seeing ssl authentication errors due to this bug: https://bugs.openjdk.java.net/browse/JDK-8213202 , starting on the 14th of May. We are updating our java library version shortly to address this issue. Would anyone know if there was a change to pypi.org on or about 14th of May that could account for this? Thank you.
Thanks, @di, for the issue reference. I'm not sure that a race condition explains what we are seeing, as it happens 100% of the time. But it might be a clue as to some changing server-side behavior because things used to work for us.
@alex glad you made the life decisions you made to help us understand this better. Your read of this sounds correct to me - there seems to be an error in the protocol negotiation and it is one byte short of what it is expecting. I don't think a pcap will help because the traffic is encrypted anyway.
It seems unlikely that this is at the TCP level because we see no issues with any other sites. Seems more like a TLS protocol-level issue.
We've been researching the issue from the squid side as well, and it seems like there are a lot of posts about squid mitm issues on the mailing list: http://squid-web-proxy-cache.1019090.n4.nabble.com/Squid-Users-f1019091.html (the post from hanxie is ours).
On our side, we might attempt a different mitm proxy to see if it changes anything. On the pypi side, I'm curious as to the answer to the IRC user's question. Was there a change to pypi.org this month?
On the pypi side, I'm curious as to the answer to the IRC user's question. Was there a change to pypi.org this month?
No, I'm not aware of anything that we changed which could explain this. I suppose it's possible something changed with our CDN provider (Fastly) but I'm not sure what that would be.
We don't have visibility into when fastly deploys stuff to their edge, but I'd be willing to bet they're constantly making changes :-)
On Wed, May 27, 2020 at 4:32 PM Dustin Ingram notifications@github.com wrote:
On the pypi side, I'm curious as to the answer to the IRC user's question. Was there a change to pypi.org this month?
No, I'm not aware of anything that we changed which could explain this. I suppose it's possible something changed with our CDN provider (Fastly) but I'm not sure what that would be.
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/pypa/pypi-support/issues/404#issuecomment-634923524, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAAAGBHRSFY33MCPMABB423RTV2GFANCNFSM4NIFFH5Q .
-- All that is necessary for evil to succeed is for good people to do nothing.
Is this still relevant?
Given that this is quite old and we haven't received similar reports, I'm closing this.
My Platform
We are experiencing timeouts when trying to
https_proxy=http://squid-proxy-host:3128/ curl https://pypi.org/
from a client going through a squid proxy atsquid-proxy-host
. This only affected proxies in some of our Google Cloud regions until yesterday but now all of our proxied requests are timing out.Our squid is presently configured as a MITM proxy but the same behavior seems to happen with our without the MITM.
Network telemetry
Setup:
What is interesting is that a direct
curl https://pypi.org/
coming from the squid host works, but the client request fails with a timeout. Hitting other websites works fine - it is only pypi.org that is failing for us.DNS Resolution
(From the proxy host)
Traceroutes
IPv4
HTTPS Requests
IPv4
(from the client)
Some proxies are not blocked and work fine. The same exact proxy deployed in a different Google Cloud region causes a timeout.
(from both good and bad proxy servers, the connection works:)
TLS Debug
IPv4
(from proxy server)