Question about slow handshakes and rate limiting.

drwhut commented 2 months ago

Hello! Thank you for making this crate, it's come in really handy for my server application :grin:

I'm at the final stages of developing my server, and now I'm trying to think of different ways that malicious actors could mess with it. My particular point of concern at the moment is when the server is accepting connections. Currently, there is a main "server" task that waits for TCP streams to be accepted, and then for each one it spawns a separate task for doing the WebSocket handshake.

In that separate task, there are two points where it awaits a future: one for creating a TLS stream, and one for creating the WebSocket stream. My concern is that if a custom client were to be written that deliberately completes the handshake as slowly as possible, it could cause this task to wait for way longer than anticipated. What would be the best way to deal with this? Would it be as simple as adding a timeout around the await, e.g. timeout(Duration::from_secs(5), accept_async_with_config(...))? Or is there a potentially cleaner way of dealing with it? Or is it not possible for the client to do a handshake slowly?

If this is a potential issue that needs to be resolved, this also opens up the possibility that clients could create lots of connections to the server and deliberately stall all of the spawned tasks. I haven't seen a built-in way to rate-limit connections in either tokio-tungstenite or tokio - is there a crate that you know of that does this automatically? Or is this something that would need to be implemented manually?

Thank you in advance!

agalakhov commented 2 months ago

Hello,

the problem of slow handshake is not WebSocket-specific. There are many known attacks based on this, including so-called SYN-flood where the client initiates connection with no intent to continue.

Fortunately, in most cases you don't have to do anything about that. Since these attacks are common, most of them if not all are already handled at operating system level. There are socket timeouts with reasonable defaults, but you can also set them manually according to your needs. For example, there is SO_RCVTIMEO in setsockopt(). Some options are exposed in Tokio, some aren't, but you always can get access to all of them using raw socket descriptor and unsafe setsockopt() call.

Most users don't need to configure anything at all. First, operating system defaults are configurable as well. I.e. under Linux there are in /proc/sys/net/ virtual filesystem. Second, a production system is likely to have some front-end reverse proxy like Nginx which forwards connections to your application, and it has configuration files for setting timeouts among other options.

Sometimes, however, it is a good idea to handle timeouts explicitly in the production code. The main concern is, what happens if the malicious user obeys these timeouts by simulating some activity but does not do anything meaningful? Imagine there are WebSocket pings sent all the time but nothing else. It is likely to happen after the connection handshake, but maybe there are some tricks for the handshake itself as well, i.e. if your code is written to follow HTTP redirects. In order to close the connection gracefully you should call the shutdown() method of TcpStream. You can achieve this by creating a future that runs in parallel to the connection (hint: there is tokio::select! macro) which acts like a watchdog. This is not limited to just the connection phase and can also watch the whole application process if desired.

Hope this helps.

drwhut commented 2 months ago

Thank you for the detailed response!

It makes a lot of sense that these kinds of attacks would be dealt with at such a low level, since like you said, this applies to a lot of protocols, not just WebSocket.

I didn't even think about using Nginx as a reverse proxy for the server! I already use it for my website and forum, and apparently you can use Nginx with WebSocket connections by setting certain headers, so I'll definitely be giving it a try. At the very least, it should work a lot better than if I implement it myself.

I had thought about the scenario where the client just stays connected forever before - in my specific case, I'm making a server for my game, in which clients use the server to establish peer-to-peer connections with each other by means of a special "room code". It's possible that a multiplayer session could last several hours, so in the past I figured there was not much point in setting a timeout, but if I were to set one it could be something like 12-24 hours?

Thanks again! :grin:

snapview / tokio-tungstenite

Question about slow handshakes and rate limiting. #347