Closed cwebber closed 9 months ago
The best solution is to use a unix domain socket (basically connecting through a file which acts as a socket) rather than localhost + port. That should work for GNU/Linux and OSX users, but not sure what to do for Windows users. At the least the old behavior could be left as a fallback, though it might leave Windows users vulnerable.
Thanks for opening this issue!
(Note: I edited the first link URL to point to https://lists.gnu.org/archive/html/guile-user/2016-10/msg00007.html which you'd mentioned on IRC, instead of being a dupe of the second link.)
A few quick thoughts:
The "wire protocol" is s-expressions not line-oriented. This might alter the exposure? (But I'm not necessarily saying this reduces it. And I certainly wouldn't claim this eliminates it).
Using domain sockets might be slightly challenging because:
racket-mode currently supports Racket versions as old as 6.0. (Although that could be increased, some Linux distributions have quite old versions, so I like to try to support them.)
racket-mode is delivered as an Emacs MELPA package, not as a Racket package. It tries to avoid deps on anything except the drracket-lib
package. (It can optionally use some packages by using dynamic-require
with some fallback if not found.)
Racket doesn't "natively" support domain sockets. Looking now, I see there is a unix-domain-sockets
package. Great! Unfortunately its info.rkt
seems to require Racket 6.12.
So. This may need to be addressed in multiple facets and stages. I'm not sure when it could be out-of-box secure-by-default for most or even many users. 😦 But we could make it better soon, and more-better later.
p.s. I neglected to include the obligatory, "At racket-mode corporation, we take security very seriously....".
The "wire protocol" is s-expressions not line-oriented. This might alter the exposure? (But I'm not necessarily saying this reduces it. And I certainly wouldn't claim this eliminates it).
Yes, I even tried attacking it by opening up a connection with telnet, but I didn't actually know what protocol it used at all :) I tried simply typing some expressions then enter, but it did nothing.
But I don't actually know what the wire format is... maybe if you can point me to it I can more easily test.
Note that the connection model is not like HTTP, where the server will accept multiple TCP connections at once, and there is a single request/response per connection.
Instead, the command server accepts just one TCP connection at a time. It reads and writes Emacs Lisp s-expressions. The request sexpr has a unique ID cons
ed on the front. The unique ID is cons
ed back to the response. So multiple commands can be "in flight" at once -- multiple requests waiting for responses -- but over just the one connection that Emacs opened.[^1]
The whole server edge is here. It's a lot of comments (hopefully helpful?) then just a couple dozen lines of Racket code.
These are the various specific commands. Note that here the initial ID is stripped off (it's the cdr
of what came over the wire.)
[^1]: The fact that the command server accepts only one connection at a time might mean the exposure is greatly reduced. There's a window when the Racket back end is starting, and the Emacs front end hasn't yet connected. An attack could connect instead? But thereafter, probably not?
Not sure if GitHub sends a new email to you for each edit? If not: I edited the URL in the first link for the server "edge" code. Should be: https://github.com/greghendershott/racket-mode/blob/master/cmds.rkt#L119-L175
In a shell you can do racket run.rkt 55557 '(run nil 2048 nil low nil)'
to run the back end server listening on 55557.
In another shell you can use nc
to connect, type command requests, and see responses.
$ nc 127.0.0.1 55557
(1 path+md5)
(1 ok (top . ""))
(2 def "display")
(2 ok kernel)
(3 def "displayln")
(3 ok ("/Applications/Racket_v6.10/collects/racket/private/misc.rkt" 187 10))
(4 describe "display")
(4 ok "<div><span style=\"color: #C0C0C0\"><i>procedure</i><quote> </quote>Provided from: racket/base, racket | Package: base</span><div class=\"SIntrapara\"><span><table cellpadding=\"0\" cellspacing=\"0\" class=\"boxed RBoxed\"><tbody><tr><td><span><div><span class=\"RktPn\">(</span><span></span><span><i><b>display</b></i></span><span class=\"hspace\"> </span><i>datum</i><span class=\"hspace\"> </span>[<i>out</i>]<span class=\"RktPn\">)</span><span class=\"hspace\"> </span>→<span class=\"hspace\"> </span><i><span>void?</span></i></div></span></td></tr><tr><td><span class=\"hspace\"> </span><i>datum</i><span class=\"hspace\"> </span>:<span class=\"hspace\"> </span><i><span>any/c</span></i></td></tr><tr><td><span class=\"hspace\"> </span><i>out</i><span class=\"hspace\"> </span>:<span class=\"hspace\"> </span><i><span>output-port?</span></i><span class=\"hspace\"> </span>=<span class=\"hspace\"> </span><span class=\"RktPn\">(</span><i><span>current-output-port</span></i><span class=\"RktPn\">)</span></td></tr></tbody></table></span></div><div class=\"SIntrapara\">Displays <i>datum</i> to <i>out</i>, similar to <i><span>write</span></i>,\nbut usually in such a way that byte- and character-based datatypes are\nwritten as raw bytes or characters. If <i>out</i> has a handler\nassociated to it via <i><span>port-display-handler</span></i>, then the handler\nis called. Otherwise, the <span>default printer</span> is used (in <i><span>display</span></i>\nmode), as configured by various parameters.</div><p>See <span>The Printer</span> for more information about the default\nprinter. In particular, note that <i><span>display</span></i> may require memory\nproportional to the depth of the value being printed, due to the\ninitial cycle check.</p><p></p></div>")
(5 exit)
$
So that's a taste of the protocol.
Let's say racket-mode
is already running a live REPL. That uses 55555
by default. A connection is open. Now if you try say $ nc -G 3 127.0.0.1 55555
it will timeout after 3 seconds.
I'm not saying everything is fine. But it's possibly not as bad as it could be.
Now if you try say
$ nc -G 3 127.0.0.1 55555
it will timeout after 3 seconds.
Actually if you use the code on master
then nc
will just sit there, getting no response, as opposed to clearly timing out. That's because tcp-listen
is getting 4
for the max-allow-wait
arg. The example I gave above, I had tried setting the value to 1
. (Thinking it might be belt+suspenders, in addition to what it already does, a single tcp-accept
at a time.)
One thing I do want to change is have the tcp-listen
hostname
argument be 127.0.0.1
instead of #f
.
I see, thank you for the detailed explaination! I'll have to give it a try later, but it only accepting one connection at a time probably limits the severity of this to a much narrower race condition. Probably something good to fix, but not crazy urgent.
As for the sort of mitigation done in that Guile commit, I think the racket-mode server effectively already has that. If it gets a non-sexpr, it closes the connection.
Here's a transcript where the (1 path+md5)
request gets a (1 ok (top . ""))
response -- but a GET / HTTP/1.1
"command" gets disconnected.
$ nc 127.0.0.1 55557
(1 path+md5)
(1 ok (top . ""))
GET / HTTP/1.1
$
Having said that, it would be good to add a comment here explicitly stating that it's desirable to drop the connection and "see issue #327". So that someday no one goes to "improve" this by making it "more robust".
Using domain sockets is safe because that's not something JavaScript code in a browser can do.
What if we accept a normal TCP connection, then challenge the client to do something that JavaScript in a browser cannot do. Is one such thing, writing to an arbitrary local file? If so, could we use that?
For example:
The server sends an opening "banner" sexpr (list (make-temporary-file) (current-seconds))
. For instance that might end up being the text '("/tmp/rkttemp1234" 1535661796)
.
The client must
The server
Any mismatch or discrepancy, and the server closes the connection.
I have a simpler and (I think) stronger proposal.
Assumption: Although a browser enables a remote attacker to use a local network port, it does not enable a remote attacker to run arbitrary programs locally. (If it could, this issue is the least of our worries, right?)
The only valid usage of the command server is when racket-mode's Emacs launches it -- i.e. uses make-comint
to do racket run.rkt <params>
.
Therefore, the Emacs front-end must pass another command-line argument when it launches the back-end: (list 'auth random-value)
. Given the threat model, I think it's OK for random-value
to come from Emacs' (random)
instead of a CSPRNG.
After something connects to the command server, it must send that same s-expression. Unless the launch and connect values match, the server exits. (If they match, the server now accepts commands for the duration of the connection.)
TL;DR: "You may connect if you can demonstrate you are the same entity that launched me."
Thoughts from anyone?
Thinking more: The timing business doesn't add any actual value. I edited the previous proposal in GitHub.
That proposal sounds like it would work to me.
The branch with commit f9712b2 -- connection attempts must supply the same "token" used to launch the server process -- is now merged to master
.
Does this close the issue?
On one hand, I'd say "yes" because I don't see anything more left to do. I think this does a reasonable job of reducing the threat as I understand it.
On the other hand, marking any security issue as "Fixed!! :tada:" feels like tempting fate.
So I'm going to leave this open for now. (Someday, if there's a new specific variation of this threat? Then I might close this issue, and make a new issue that can benefit from a fresh, focused comment thread.)
As a p.s., I think I may have re-invented the X server cookie solution to this problem. :)
Just an update that some details have changed -- but I don't believe this changes the substance or risk.
Since commit 417237c a couple months ago, TCP is used for REPL I/O (as opposed to command I/O). I think this change is a wash:
Either way could execute arbitrary code (e.g. one command is an eval
).
The TCP connection still requires supplying the random authorization code, to demonstrate that the thing making the connection is the same thing that started the back end process.
Since commit 304acf1 a couple weeks ago, the back end picks an available so-called ephemeral TCP port. As a result the port number is less predictable, which is some small improvement -- but certainly not "random" in any strong sense.
Can't the browser be made to ban any connections to the local subnet from outside of it? This seems the only real solution.
It seems the browsers have indeed already solved this issue, as far as I can tell:
https://developer.chrome.com/blog/private-network-access-update/
Private Network Access (formerly known as CORS-RFC1918) restricts the ability of websites to send requests to servers on private networks. It allows such requests only from secure contexts. The specification also extends the Cross-Origin Resource Sharing (CORS) protocol so that websites now have to explicitly request a grant from servers on private networks before being allowed to send arbitrary requests.
Private network requests are requests whose target server's IP address is more private than that from which the request initiator was fetched. For example, a request from a public website (https://example.com) to a private website (http://router.local), or a request from a private website to localhost.
Although I'm always happy to review any security issue, I think the status quo situation wrt Racket Mode is good:
The TCP port number is "ephemeral", chosen by the OS. Of course this alone isn't "random" enough to be sufficient; more importantly...
The back end will terminate a TCP connection unless the client supplies the same token given to the back end when it was launched. (If an attacker can launch programs, that's the more basic vulnerability.) I think this is sufficient; but also...
Even if a client managed to write such a token, the TCP connection uses an s-expression protocol; sending an HTTP request causes the connection to be closed.
But nothing in security is absolute. For example the token in item 2 might be guessable. So although I believe Racket Mode isn't vulnerable to the kind of generic "confused deputy" HTTP attacks that opened this issue, it certainly could vulnerable to a tailored attack.
This is moot ever since commit 2522488 as of which we no longer use a TCP connection.
We had to fix a similar issue in Guile last year. Localhost + port isn't actually safe in that it's possible for the browser to act as a "confused deputy" so that lines of commands can be sent to the listening REPL, thus allowing code execution.