ruby / openssl

Provides SSL, TLS and general purpose cryptography.
Other
240 stars 163 forks source link

remove file check to support proxied SSL connection #731

Open HoneyryderChuck opened 5 months ago

HoneyryderChuck commented 5 months ago

Hi, I'm implementing HTTPS connect proxy support in httpx, which is supposed to connect via SSL to the proxy before sending the CONNECT request and renegotiate the SSL connection.

This seems to be impossible to do however, given that SSLSocket initialization checks whether IO is a file, rather than something that quacks as a file.

FWIW I'm trying to accomplish the same feature as this curl example here:


> curl -x https://proxyuser:password@proxyhost:9080 https://example.com/get --proxy-cacert /proxy/ca.crt
rhenium commented 5 months ago

Currently, SSLSocket passes the file descriptor to OpenSSL and doesn't use Ruby's IO interface.

https://github.com/ruby/openssl/blob/45c731c5bbada6be9eb84013939dd74c3779acdd/ext/openssl/ossl_ssl.c#L1688-L1689

OpenSSL uses an abstraction layer called BIO to interact with the underlying socket. Here is the default implementation (a "BIO method" - SSL_set_fd() will set up the BIO_s_socket()): https://github.com/openssl/openssl/blob/master/crypto/bio/bss_sock.c

It's more work than just removing a check, but it should be possible to define a custom BIO method that wraps an IO-ish object (as long as it supports non-blocking IO methods).

HoneyryderChuck commented 5 months ago

I was anticipating it to be much harder than just removing the check, no worries.

To better illustrate what I'm looking for, here's a practical example using docker: I've just pushed the "infra setup" changes to this MR branch. Once you download the repo and switch to the "https-proxy" branch:

And you should be in a position to replicate the issue. You can run:

> curl -x https://proxyuser:password@httpsproxy:9080 \
          --proxy-cacert test/support/ci/certs/ca.crt \
          https://nghttp2/get -v

to verify what/how curl does it. To replicate what at least I'd expect to work in ruby, you can run this script:

require "socket"
require "openssl"
require "base64"

AUTH = Base64.strict_encode64("proxyuser:password")

CONNECT_REQ = <<-CONNECT.gsub("\n", "\r\n")
CONNECT nghttp2:443 HTTP/1.1
Host: nghttp2
Proxy-Authorization: Basic #{AUTH}

CONNECT

REQ = <<-REQ.gsub("\n", "\r\n")
GET /get HTTP/1.1
User-Agent: smth
Accept: */*
Connection: close
Host: nghttp2

REQ

tcp = TCPSocket.new("httpsproxy", 9080)
ctx = OpenSSL::SSL::SSLContext.new
ssl = OpenSSL::SSL::SSLSocket.new(tcp, ctx)
ssl.hostname = "httpsproxy"
ssl.sync_close = true
ssl.connect
ssl.post_connection_check("httpsproxy")

ssl.write(CONNECT_REQ)

response = ssl.readpartial(2048)

puts "response from proxy connect"
puts response.inspect

raise unless response.start_with?("HTTP/1.1 200")

ssl.write(REQ)

puts ssl.readpartial(2048)

which currently breaks on the last line with ...openssl/buffering.rb:157:insysread': end of file reached (EOFError)`

HoneyryderChuck commented 5 months ago

@rhenium just opened an MR to explore a possible approach to solving this. LMK what you think, and whether you think there are other alternatives worth pursuing.

rhenium commented 5 months ago

I was exploring the approach to implement a BIO_METHOD that simply translates those Ruby IO methods. It's not reviewed carefully, but here is a WIP PR: #736

Thanks for the pointer to usage of the memory BIO in Python's ssl library. I didn't know about it, and that approach is interesting. While it is slightly inefficient due to one additional data copy, if I understand it right, it would let us interact with the IO-like object outside of SSL_{read,write,..}() and reduce the amount of rb_protect() state variables churn. It's quite a mess in my WIP PR.

HoneyryderChuck commented 5 months ago

hey @rhenium , after a bit of back-and-forth with my approach, and hitting some limitations of my openssl C API expertise, and also researching other implementations, I'm leaning towards your approach of BIO_METHOD as being the most viable.

To begin with, it works! I've tested it with the script pasted above against a server behind-a-squid-behind-stunnel, and even if still a rough patch, it worked the first time. So that's something I haven't been able to produce.

I've also been looking at how curl works on top of openssl, and it seems that it also ships with its own BIO_METHOD. The main advantage seems to be that SSL_connect usage seems to "just work", whereas you'd have to "drop down" a layer of abstraction if maintaining bio-mems around (ssl connect was the hurdle I was never to pass from in my attempt). This sees to be how cpython works with openssl, where different APIs are used (SSL_write_ex instead of SSL_write, and SSL_do_handshake instead of SSL_connect, just to name a couple).

I also find that approach to be a better template to implement on top of openssl QUIC connections; not sure how much you've considered introducing them at this point though.

About rbprotect usage, I'm not sure if you really need it, considering you're using the "exception: false" variants of the non-blocking APIs? Moreover (and a bit tangential), having researched the cpython implementation (a bit superficially, I admit), I was wondering why can't the gem release the GVL more often, for example when calling the SSL* family of APIs (python examples here and here), as I believe roughly the same constraints apply?

Thx for the help sofar, and LMK how I can help move this forward! 🙏

Aubermean commented 2 weeks ago

Nudging this, any ideas @rhenium ?