grepplabs / kafka-proxy

Proxy connections to Kafka cluster. Connect through SOCKS Proxy, HTTP Proxy or to cluster running in Kubernetes.
Apache License 2.0
501 stars 87 forks source link

Client certificates check between proxy client and brokers client #37

Closed mgusiew-guide closed 4 years ago

mgusiew-guide commented 4 years ago

Hi,

Here is my use case. I need to proxy AWS MSK (Managed Service Kafka) in order to connect it from any network (MSK connectivity options are bit limited as of now). MSK cluster is created with mutual TLS, this is because it is multitenant and each tenant has its own client cert. Client certs for all tenants are signed by same CA (CA certificate monthly fee is pretty high so this is cost optimization). I have one proxy group for each tenant and still want to maintain TLS between proxy client and proxy.

Kafka-proxy offers a possibility to terminate mutual TLS on proxy and also to start mutual TLS on proxy so those two capabilities can be used together in order to make sure that: 1) Client connects with client cert 2) Client credentials are propagated to server I would like proxy to terminate client TLS and start TLS with same certificate

I can start TLS on proxy with selected certificate (tls-client-cert-file) so this part is fine On proxy listener side I can validate that proxy client cert is signed with given CA (proxy-listener-ca-chain-cert-file) but I am not able to check that proxy client cert is same as tls-client-cert-file. This is needed so that tenantA is not able to use same proxy group as tenantB (They are signed with same CA)

I did some hacking on kafka-proxy and came up with some prototype that is backwards compatible by default but allows to check for same client cert if needed (new config flag). The code is pretty straightforward and available here: https://github.com/mgusiew-guide/kafka-proxy/tree/SameClientCertEnable

Unfortunately I did not create any automated tests for now (just tested manually) but that will follow, just wanted to have something in order to start discussion.

Would you be interested in such contribution ? I could redesign and add some tests based on your review.

Let me know what you think

everesio commented 4 years ago

Thank you for reporting. All contributions are welcome. I will check your code soon.

mgusiew-guide commented 4 years ago

Hi @everesio, just wondering if you had a chance to look at the code ?

everesio commented 4 years ago

Hi @mgusiew-guide, thank you for the reminder and I am sorry it took me so long to have a look.

In principle I like the idea and it could be integrated. There are some small things like unnecessary formatting flags in error texts e.g. '%v' in the errors.New("proxy server: handshake failed: %v"). Some tests would be fine of course ;-)

Maybe what I am missing is the timeout handling for the Handshake() in the handshakeAsTLSAndValidateClientCert Please check the tlsDialer. As a configuration the DialTimeout could be theoretically reused.

mgusiew-guide commented 4 years ago

Hi @everesio ,

Thank you for expressing interest in integrating this feature.

I'm going to add some tests, improve error messages and also look at dial timeout as suggested.

I will get back to you once I apply changes :)

mgusiew-guide commented 4 years ago

Hi @everesio ,

I created new commit (code available here: https://github.com/mgusiew-guide/kafka-proxy) with tests, improved docs and come changes to error messages.

WRT handshake timeout I did some research and need to discuss further. It is easy to set handshake timeout from client side but in this case we have server side (the only reason for the server initiated handshake is to get client certificate that is used by proxy client). I looked at tlsConfig and did not find anything. I also checked net.Conn and it looks like it allows to call SetReadDeadline, SetWriteDeadline and SetDeadline (while tls.Conn allows only for SetDeadline). I could set one of these but it looks to me more generic than just handshake timeout (my understanding is that read timeout is for handling whole request and write timeout includes repsonse). Therefore I decided not to change it for now but rather discuss.

Let me know what you think.

everesio commented 4 years ago

The handshakeAsTLSAndValidateClientCert should wait for a TLS handshake on the local connection i.e. it will act as server and wait for client HELLO. IMO you can use SetDeadline on the connection, but at the end of the method should be reset / set to 0.

Please create PR and I will have check you improvements.

Thank you for all your efforts !

mgusiew-guide commented 4 years ago

I agree that timeout is good in order to protect against malicious attacks that may try to delay handshake, just missed the reset part :). Therefore I added handshake timeout in new version (I used dial timeout and applied reset as discussed earlier in this thread). I also merged all the commits, cleaned up the code a little bit and created pull request https://github.com/grepplabs/kafka-proxy/pull/42 .

Please let me know in case if anything else is needed.

Thanks for your support !

everesio commented 4 years ago

Release v0.2.3

mgusiew-guide commented 4 years ago

Great news ! Thanks for your time @everesio !

We are using Docker images, I confirm that grepplabs/kafka-proxy:latest supports new flag. That being said we are currently using grepplabs/kafka-proxy:kafka-2.3.0 (for Kafka 2.3.0) and grepplabs/kafka-proxy:kafka-2.4.0 (for Kafka 2.4.0) images and unfortunately they are based on older kafka-proxy release. Is there a plan for analogical images that support new features or should we make them ourselves. In case if we need to create images ourselves it would be great to get some instructions on how to do that.

Thank you for your support !

everesio commented 4 years ago

BTW. You can use latest version kafka-proxy or latest image grepplabs/kafka-proxy:v0.2.3 it should support all versions up to kafka 2.5.0 (between 2.4.0 and 2.5.0 there was no protocol changes). Protocol version negotiation happens during handshake so server / client versions can mismatch.