grafana / dskit

Distributed systems kit
Apache License 2.0
468 stars 68 forks source link

Inefficient TCP connections use by memberlist transport #193

Open pracucci opened 2 years ago

pracucci commented 2 years ago

We use a custom transport for memberlist, based on TCP protocol. The main reason why we use TCP is being able to transfer messages which are bigger than the maximum payload of an UDP packet (typically, slightly less than 64KB).

Currently, the TCP transport is implemented in an inefficient way with regards to TCP connection establishment. For every single packet a node needs to transfer to another node, the implementations creates a new TCP connection, writes the packet and then close the connection. See: https://github.com/grafana/dskit/blob/ead3f9308bb7b413ce997182dd4d7c6e038bc68f/kv/memberlist/tcp_transport.go#L438

We should consider alternatives like:

pstibrany commented 2 years ago

One alternative which we could explore is reusing gRPC connection and implement Packet (and perhaps Stream, if possible) operations on top of gRPC (as grpc methods). This would give us connection pooling, it would remove the need to configure another port, and it would reuse gRPC TLS settings.

pstibrany commented 2 years ago

My reason to implement tcp transport the way it is, was to keep it simple, with no state on the connection. I agree it's not efficient, and it is time to revisit that decision.

stevesg commented 1 year ago

I recently discovered another issue with this, though I'm unsure whether it's an immediate cause for concern - conntrack table utilization. These short lived connections live on in conntrack for some number of minutes.

A survey of a single node in one of our dev environments at Grafana showed that two thirds (~6000 of ~9000) of the conntrack table were TIME_WAIT dport=7946.

seizethedave commented 6 months ago

Awesome. I would imagine persistent TCP conns would help quite a bit. UDP seems less desirable with intermittently flaky cloud networking, inability to do TLS, ...

zalegrala commented 2 months ago

I thought it might be interesting to see the state of the QUIC protocol. Its been years since I looked, but there is a nice looking Go implementation with reasonable docs.

I copied tcp_transport.go and replaced some of the details for the handling. I'll see if I can test this soon, but curious what others think along these lines.

There are a few things to like about this protocol, though I have no hands-on experience with it. The wikipedia page has some good details as to what prompted the development and use of a new internet protocol.

For the memberlist use case, it might make a nice optional transport if we wanted to support more than one possibility. I've not considered what it would take to migrate from one memberlist transport to another.

Additionally, it appears that TLS is required, which could be a non-starter for some environments.