sustrik / libdill

Structured concurrency in C
MIT License
1.68k stars 156 forks source link

Getting HTTP protocol request body #204

Open paulofaria opened 4 years ago

paulofaria commented 4 years ago

I'm having trouble changing the webapp example to allow reading the request body. My idea was to detach after receiving the fields, receive the body, attach again to send the status, detach one more time to send the response body. Everything works fine until I try to detach the second time. The program simply blocks and never returns. I'm not sure if this is a bug or a mistake in my understanding of how the library should be used.

coroutine void html_worker(int s) {
    s = http_attach(s);
    assert(s >= 0);

    char command[256];
    char resource[256];
    int rc = http_recvrequest(s, command, sizeof(command),
        resource, sizeof(resource), -1);
    assert(rc == 0);

    printf("%s %s HTTP/1.1\n", command, resource);
    /* This will hold the content length */
    int sz = 0;

    while(1) {
        char name[256];
        char value[256];
        rc = http_recvfield(s, name, sizeof(name), value, sizeof(value), -1);
        if(rc < 0 && errno == EPIPE) break;
        printf("%s: %s\n", name, value);
        if(!strcmp(name, "Content-Length")) sz = atoi(value);
        assert(rc == 0);
    }

    /* Detach so we can read the request body. */
    s = http_detach(s, -1);
    assert(s >= 0);

    /* Read the request body */
    char buf[sz + 1];
    rc = brecv(s, buf, sz, -1);
    assert(rc == 0);
    buf[sz] = '\0';
    printf("\n%s\n", buf);

    /* Attach again so we can send the reply. */
    s = http_attach(s);
    assert(s >= 0);

    /* Send an HTTP reply. */
    rc = http_sendstatus(s, 200, "OK", -1);
    assert(rc == 0);

    /* Perform HTTP terminal handshake. */
    s = http_detach(s, -1); /* <--------- This is where the program blocks forever.
    assert(s >= 0);

    /* Send the HTML to the browser. */
    rc = bsend(s, html, strlen(html), -1);
    assert(rc == 0);

    /* Close the underlying TCP connection. */
    rc = tcp_close(s, -1);
    assert(rc == 0);
}
paulofaria commented 4 years ago

To be more specific it gets blocked in this line: https://github.com/sustrik/libdill/blob/de7a917bc39756f61237b8b779dd8be735652f88/term.c#L168 From what I understand it waits for the client to send more data, which is strange since I already read the whole request body.

jakgra commented 4 years ago

I have the same problem. I didn't find any official way in the documentation and am probably misunderstanding some parts of how the library should be used. I also tried the way described above but failed with the same problem. It worked when I would only http_detach() once and afterwards read the body, but this is very unpractical, because I first need to parse the body to know if I need to respond with 200 or 500 using http_sendstatus() and http_sendstatus()doesn't work after calling http_detach(). Calling http_attach() and http_detach() 2 times doesn't work.

I think the problem is that the TERM protocol expects at least one message to be received, so it waits for it before closing the socket in term_detach() called from http_detach() and doesn't receive it because we have already read all the headers and all the body, when calling http_detach() the second time. Another problem is that http_detach() adds an empty line and curl hangs up if no content-length is specified. So my custom solution for now is adding a few functions to libdill that work around the above described problems. But there is probably some nicer official solution.

Here is the code to my temporary solution: https://github.com/jakgra/libdill/commit/9d5d13d90bc13b7787b0631ec22e3d5a199a561d

The way you use it is the first time you call http_pause() instead of http_detach(). All the other stuff is the same as in the above solution:


  deadline = now() + 2000;

  s = http_attach(s);
  check(s >= 0, http_cleanup);
  rc = http_recvrequest(s, command, sizeof(command), resource, sizeof(resource),
                        deadline);
  check(rc == 0, http_cleanup);
  body_len = -1;
  while (1) {
    char name[256];
    char value[256];
    rc = http_recvfield(s, name, sizeof(name), value, sizeof(value), deadline);
    if (rc < 0 && errno == EPIPE)
      break;
    check(rc == 0, name_cleanup);
    if (strcasecmp(name, "Content-Length") == 0) {
      errno = 0;
      body_len = strtol(value, &endptr, 10);
      check(body_len >= 0, name_cleanup);
      check(errno == 0 && *endptr == '\0', name_cleanup);
    }
  }
  s = http_pause(s, deadline);
  check(s >= 0, pause_cleanup);
  if (body_len > 0) {
    body = bfromcstr("");
    check(body, pause_cleanup);
    // TODO check for max body length :), to prevent malicious RAM exhaustion
    rc = balloc(body, body_len + 1);
    check(rc == BSTR_OK, body_cleanup);
    rc = brecv(s, body->data, body_len, deadline);
    check(rc == 0, body_cleanup);
    body->slen = body_len;
    body->data[body->slen] = '\0';
  } else {
    body = NULL;
  }
  s = http_attach(s);
  check(s >= 0, body2_cleanup);

  // ... handle all the bussines logic and set the response_body string

  /* Send an HTTP reply. */
  rc = http_sendstatus(s, status_code, status, deadline);
  check(rc == 0, final_cleanup);

  rc = http_sendfield(s, "Content-Type", "application/json; charset=utf-8",
                      deadline);
  check(rc == 0, final_cleanup);

  body_len = strlen(response_body);
  length = bformat("%d", strlen(response_body));
  check(length, final_cleanup);
  rc = http_sendfield(s, "Content-Length", bdata(length), deadline);
  bdestroy(length);
  check(rc == 0, final_cleanup);

  /* Perform HTTP terminal handshake. */
  s = http_detach(s, deadline);
  check(s >= 0, final_cleanup);

  /* Send the HTML to the browser. */
  rc = bsend(s, response_body, strlen(response_body), deadline);
  check(rc == 0, final_cleanup);

  /* Close the underlying TCP connection. */
  rc = tcp_close(s, deadline);
  check(rc == 0, final_cleanup);
MobiusHorizons commented 4 years ago

I also have this issue. It's a huge issue that you can't read the body of an http request. Is there really no supported way to do this?

paulofaria commented 4 years ago

@sustrik I know you're very busy, but I kindly ask you to at least reply with a "this is a bug I'm open to PRs" or "I don't expect it to be supported in libdill, create a fork or a new protocol in your application". Just so we have a direction.

sustrik commented 4 years ago

Yes, it seems to be a bug and I am open to PRs.

PythonJedi commented 3 years ago

Seems like I'm getting to experiment with libdill by fixing stuff right off the bat. I haven't tested the client code here against #209 just yet (would make a good candidate for extra test cases for HTTP, I think), it's gotten far later than I intended and I need to get some sleep before work.

The root cause of the issue is that HTTP/1.1 (and presumably 2.0) considers the optional content body that follows the CRLF that terminates the header as part of the message, but we're trying to drop back to the underlying bytestream socket to read that body. TERM expects an empty line for the terminal handshake every time it is detached, so detaching the second time hangs waiting for the client to send another empty line, which it won't if it's only sending one request. Nor will the client close the socket, as it is waiting for the body the server promised would follow.

Regardless of how the HTTP module exposes the message body to application code, it will need to detach the SUFFIX protocol in order to correctly pass through arbitrary octets in the message body (it's not necessarily text), and so anything attached above it also needs to be detached. I added a variant of TERM called HUP (short for hang-up) that allows a peer to detach without receiving a terminal message from the other peer, but ensures that the terminal message is sent if any other messages have been sent. I'm not sure that that's the best approach, since it's easy to read messages as the wrong protocol by detaching early, but at least I think I grok the basics of libdill after implementing a new micro-protocol. It might be better to just drop the use of TERM from HTTP altogether, have it work directly with a SUFFIX socket, and handle the empty lines explicitly. However, I suspect this isn't the only scenario where this kind of behavior is desirable.

I'm not sure that I'd want libdill to add much more functionality to HTTP. It appears to get quite hairy to handle all the edge cases appropriately if we wanted to, say, provide a wrapped socket to read off the body octets in a controlled manner instead of completely detaching the HTTP protocol to read/write the message body. That could easily start rather aggressive feature creeping towards a full-blown REST framework. Not that I wouldn't want such a thing to exist (I think it'd actually be rather fantastic to have a structured concurrency REST backend), but that should be tackled by a separate project, in my opinion. I suspect other protocols like SMTP/POP3/IMAP/SIP/XMPP/etc. would fall into the same category. As soon as we hit the application layer in the networking stack, there's just too much variation to try and have them all live under the same project, and it's arguable whether HTTP should be included in libdill, as it's otherwise just providing transport layers (since SOCKS5 is a proxy). That discussion should probably be in a separate issue, however.