python / cpython

The Python programming language
https://www.python.org
Other
62.28k stars 29.92k forks source link

socketpair() bugfix leaves ability for attacker to infinitely block the function under some limited circumstances #122797

Open ell1e opened 1 month ago

ell1e commented 1 month ago

Bug report

Bug description:

(This issue concerns https://github.com/python/cpython/issues/122133 https://github.com/python/cpython/commit/78df1043dbdce5c989600616f9f87b4ee72944e5 )

I reported the initial issue of socketpair() and it was suggested to me to bring up here that the fix seems incomplete to me. It seems to me an attacker could still try to intercept the socketpair creation and cause a hang in this line: https://github.com/python/cpython/blob/78df1043dbdce5c989600616f9f87b4ee72944e5/Lib/socket.py#L647 (by beginning to set up the TCP connection but not completing the initial handshake process, which I assume should be possible with the right use of "raw sockets" and lowlevel packets rather than the normal highlevel connect())

I think this is a problem despite the normal expectation that most socket functions can almost always block or hang, because a user wouldn't expect socketpair() specifically to hang but rather either work or almost instantly fail since it's a local operation.

There are probably some caveats why this is a minor issue:

  1. On most systems there is a default timeout of 20 seconds or so, this should still apply here. But on systems where there isn't, this would possibly hang forever. Even if there is a timeout, 20 seconds will likely also too long for what most applications would usually expect this function to take.

  2. It might require root or CAP_NET_RAW on most systems to exploit. Nevertheless, this could be the case for some network diagnostic tooling the user runs e.g. in flatpak that was meant to still have sandboxing and isolation that might be impacted by this.

  3. Unlike the original problem it would therefore require local access, since I assume the JavaScript access to localhost sockets from random websites usually wouldn't have CAP_NET_RAW.

A possible fix would be to either set non-blocking on the server socket as well and do the accept() in a small short busy loop before failing almost instantly, or to set a short socket timeout (probably under 0.5 seconds) if that one is platform independent enough to do the job. My own code suggestion I sent some weeks back has a timeout longer than 1 second but that's probably too long given what the average application would be reasonably expecting.

CPython versions tested on:

CPython main branch

Operating systems tested on:

Windows

gpshead commented 1 month ago

We should not have any form of busy loop.

Localhost processes with CAP_NEW_RAW equivalent access needed to do a partial TCP handshake already have sufficiently elevated privileges that doing this seems like one of the lesser things they could do to another process on the system. It's a mere local non resource wasting denial of service at best.

I suggest creating a Windows proof of concept. Ie: Does its TCP stack even hang in an accept() call because of an incomplete handshake. Does the listen() queue fill up with partially constructed but incomplete connections at all? accept() should only return when a completed SYN/SYNACK/ACK handshake has occurred - regardless of any others in progress. It shouldn't be waiting for the first SYN received to only return that connection rather than ones simultaneously happening that actually complete theres. All of that is at the OS TCP stack level.

ell1e commented 1 month ago

accept() should only return when a completed SYN/SYNACK/ACK handshake has occurred - regardless of any others in progress. It shouldn't be waiting for the first SYN received to only return that connection rather than ones simultaneously happening that actually complete theres.

If that is correct and accept really won't ever do anything further than instantly return in that situation then I was wrong and the issue doesn't exist. That would be nice, of course! I guess it's hard to check with Windows, given the code isn't legally available to the public. You wouldn't happen to know if this is documented somewhere?

As an additional note, I feel like maybe it might be a good idea to just use a non-blocking or timeout approach anyway just to be safe. It feels like a more defensive approach here could be useful, e.g. if this code ever is used for other operating systems where maybe the behavior is different.

(Sorry, I had briefly duplicated this comment by accident. I apologize for the notification spam)

ell1e commented 1 month ago

For what it's worth, I personally think this statement isn't necessarily true for some reasons, e.g. related to Windows containers:

Localhost processes with CAP_NEW_RAW equivalent access needed to do a partial TCP handshake already have sufficiently elevated privileges that doing this seems like one of the lesser things they could do to another process on the system. It's a mere local non resource wasting denial of service at best.

But I don't want to waste everyone's time going into that if the hang I pointed out isn't actually possible.

gpshead commented 1 month ago

I'm having trouble coming up with authoritative references on how the Windows TCP stack behaves, but in general IIUC the use of a OS-wide "SYN queue" or the use of SYN cookies should be common for partial "ESTABLISHING" state connections. The size of that queue (if one exists) is not related to a specific listen() call. The queue accept() pulls from is generally considered to be the established connection queue or accept queue, the size of which can be adjusted based on the listen() call's second queue size parameter. https://security.stackexchange.com/questions/43692/tcp-syn-attack-prevention is the closest thing to a Windows specific reference I could find. @zooba or @eryksun may be able to fill in more Microsoft stack specific details.

zooba commented 1 month ago

I'm out of my depth here (wouldn't even know what to look for in the Windows sources, though I can), but I don't really see how looping over a short wait is that much worse than a long wait? In most cases we'd expect a successful loopback connection to be near instantaneous, so even a short wait should return and we break out. But then, assuming you are under active attack, having a quick break out is potentially worse than a slow one, as you're far more likely to detect the attack.

There does seem to be a (configurable) limit to the number of half open connections allowed, which to me says we should ignore the possibility ourselves and let the OS deal with it - a setting like this more clearly indicates the user's intent than anything we would do automatically, and if we work around their limit then we're just working against their intent. Best to play dumb and let things work the way they set it up.

ell1e commented 1 month ago

As for the impact: a long wait seems worse to me since a quick or instant failure should be an expected code path for the caller, unlike long wait.

Since people don't seem to be fully sure about what's guaranteed here, I personally still think doing a short loop here would be preferable. But I have nothing more of use to add.