yegor256 / takes

True Object-Oriented Java Web Framework without NULLs, Static Methods, Annotations, and Mutable Objects
https://www.takes.org
MIT License
805 stars 197 forks source link

\r and \n preceding the HTTP method in a HTTP request trip up the parser #1295

Open EugenDueck opened 4 months ago

EugenDueck commented 4 months ago

As mentioned over at https://github.com/yegor256/takes/pull/1294#issuecomment-2092061121, I wondered why BkBasicTest.handlesTwoRequestInOneConnection() succeeds, despite the Content-Length of the first request not counting the \r\n that Joined() inserts between "Hello First" and "POST" . So it should be 13, not 11, which it is now:

socket.getOutputStream().write(
    new Joined(
        BkBasicTest.CRLF,
        BkBasicTest.POST,
        BkBasicTest.HOST,
        "Content-Length: 11",
        "",
        "Hello First",
        BkBasicTest.POST,
        BkBasicTest.HOST,
        "Content-Length: 12",
        "",
        "Hello Second"
    ).asString().getBytes()
);

When you change the test to print head and body like so:

new BkBasic(req -> {
    RqPrint rqPrint = new RqPrint(req);
    System.out.println("-------- REQUEST HEAD -------- \n" + rqPrint.printHead());
    System.out.println("-------- REQUEST BODY -------- \n" + rqPrint.printBody());
    return new TkText(text).act(req);
}).accept(
    server.accept()
);

it turns out the second request has no headers, except for the synthetic ones added by Takes. Instead, the headers end up in the body. To wit:

1st request

-------- REQUEST HEAD -------- 
POST / HTTP/1.1
Host:localhost
Content-Length: 11
X-Takes-LocalAddress: 192.168.32.222
X-Takes-LocalPort: 61791
X-Takes-RemoteAddress: 192.168.32.222
X-Takes-RemotePort: 61792

-------- REQUEST BODY -------- 
Hello First

2nd request

-------- REQUEST HEAD -------- 
X-Takes-LocalAddress: 192.168.32.222
X-Takes-LocalPort: 61791
X-Takes-RemoteAddress: 192.168.32.222
X-Takes-RemotePort: 61792

-------- REQUEST BODY -------- 
POST / HTTP/1.1
Host:localhost
Content-Length: 12

Hello Second

When changing the content length of the first request to the correct value of 13, headers of the second request will be parsed correctly.

I think in case of \r or \n preceding the http method, Takes should either

  1. disconnect, because the request is invalid
  2. discard those leading \r and \n characters and otherwise parse the request as normal

I just checked google.com's "gws" server and an nginx-alpine docker image: They both implement 2., discarding leading \r and \n characters, regardless of how many there are and in what order:

> echo -ne 'GET / HTTP/1.1\r\nHost: google.com\r\n\r\n' | nc google.com 80 | head -1
HTTP/1.1 301 Moved Permanently
> echo -ne '\rGET / HTTP/1.1\r\nHost: google.com\r\n\r\n' | nc google.com 80 | head -1
HTTP/1.1 301 Moved Permanently
> echo -ne '\nGET / HTTP/1.1\r\nHost: google.com\r\n\r\n' | nc google.com 80 | head -1
HTTP/1.1 301 Moved Permanently
> echo -ne '\r\nGET / HTTP/1.1\r\nHost: google.com\r\n\r\n' | nc google.com 80 | head -1
HTTP/1.1 301 Moved Permanently
> echo -ne '\r\n\r\r\n\n\nGET / HTTP/1.1\r\nHost: google.com\r\n\r\n' | nc google.com 80 | head -1
HTTP/1.1 301 Moved Permanently

But they don't discard a leading space:

> echo -ne ' GET / HTTP/1.1\r\nHost: google.com\r\n\r\n' | nc google.com 80 | head -1
HTTP/1.0 400 Bad Request

So, I suggest that Takes follow the models of gws and nginx and discard leading \r and \n.