racket / web-server

Other
90 stars 47 forks source link

Add current-header-handler and dispatch-logresp #119

Closed sorawee closed 2 years ago

sorawee commented 2 years ago

Add a new parameter current-header-handler which can be used to transform the "header" (including code, message -- everything but the body) right before it will be written to the output port.

Use this new parameter to implement dispatch-logresp, a new dispatcher that consumes another dispatcher, and returns a dispatcher that acts like the input dispatcher, except that if the input dispatcher is dispatched, it will log request and response, just like how the Apache server does.

Still need tests and documentation.

Partially fixes #54

sorawee commented 2 years ago

@jeapostrophe I don't know if this is "clean" enough for you. I handled the response code at a lower level than you suggested, because I also want it to be able to log response code from dispatch-files, etc.

Some concerns:

LiberalArtist commented 2 years ago

I've sometimes wanted (and have sometimes implemented) ways to interpose on response handlers.

I haven't reviewed the code in detail—maybe you've already thought about this!—but one thing I wanted to flag is the use of a parameter for current-header-handler: it's important to think through the details of how parameters interact with continuations and whether that gives you the semantics you want. I was reminded recently on Discourse of an old, bad bug I wrote based on a faulty understanding of that interaction, and I'm pretty sure the buggy implementation—everyone who logged in to the site ended up in the account of the first person to log in after restarting the server—involved in part the ad-hoc response header interposition position point I was using.

I'll try to make time to look at this more closely.

Bogdanp commented 2 years ago

I wonder if it wouldn't be better to extend connection with an additional response-processor-proc field instead of using a param. Your new dispatcher could then add its logger by augmenting (composing its processor onto the processor field of a struct-copyed struct) the connection before passing it down to the next dispatcher. That way, output-response could grab the processor directly off of the conn arg, and we wouldn't need to worry about parameterizations going wrong.

Re. performance, I would suggest setting up a minimal server that displays some text and throwing some load at it with vegeta[1]. Raise your file descriptor limits before running the server and vegeta (ulimit -n unlimited in each terminal session), then progressively hit it as hard as you can on the master branch until you find its limit, then compare that against your branch with or without the new dispatcher. My guess is the change to output-response won't make much of a difference (though using a field instead of a param will probably be faster).

[1]: I like to use tee in the middle of my vegeta pipelines so I can easily save reports and see the results with the same command: cat targets | vegeta attack -insecure -duration=30s -rate 100/s | tee report | vegeta report

sorawee commented 2 years ago

I thought about modifying connection before, but judging from how it's used and how stateful it is, I don't think that it is the right place.

The top-level of the web-server is this:

(define conn
    (new-connection cm ip op (current-custodian) #f))

  (with-handlers ([exn-expected?
                   (lambda (_)
                     (kill-connection! conn))])
    (let connection-loop ()
      (define-values (req close?)
        (config:read-request conn config:port port-addresses))
      (reset-connection-timeout! conn config:response-timeout)
      (set-connection-close?! conn close?)
      (config:dispatch conn req)
      (if (connection-close? conn)
          (kill-connection! conn)
          (connection-loop))))

A connection is established once, and then it is used in a loop which could potentially invokes dispatch multiple times. Hence, a naive dispatcher that installs logging would install it multiple times. We of course can avoid that, but this perhaps suggests that conn is not the right place to store the information.

sorawee commented 2 years ago

@LiberalArtist I tried your minimal example program, and it appears that current-header-handler is not affected by the issue:

;; common.rkt

    #lang racket
    (provide my-param)
    (define my-param
      (make-parameter #f))

;; servlet.rkt

    #lang web-server
    (require "common.rkt")
    (provide start)
    (define (start req)
      (response/xexpr
       `(html (head (title "Example"))
              (body (p ,(format "~v" (my-param)))))))

;; run.rkt

#lang racket
(require "common.rkt"
         "servlet.rkt"
         web-server/http
         web-server/servlet-dispatch
         web-server/http/response)

(define ((make-header-handler orig request) resp)
  (orig (struct-copy response resp
                     [message (string->bytes/utf-8 (~s 'hello:
                                                       (request-uri request)))])))

(define ((wrap-dispatcher inner) connection request)
  (define orig (current-header-handler))
  (parameterize ([my-param (request-uri request)]
                 [current-header-handler (make-header-handler orig request)])
    (inner connection request)))

(serve/launch/wait (λ (quit-sema)
                     (wrap-dispatcher
                      (dispatch/servlet start
                                        #:stateless? #t)))
                   #:banner? #10 

And here are the results

$ curl http://localhost:8000/foo --verbose
*   Trying 127.0.0.1:8000...
* Connected to localhost (127.0.0.1) port 8000 (#0)
> GET /foo HTTP/1.1
> Host: localhost:8000
> User-Agent: curl/7.79.1
> Accept: */*
>
* Mark bundle as not supporting multiuse
< HTTP/1.1 200 hello: #(struct:url #f #f #f #f #t (#(struct:path/param "foo" ())) () #f)
< Transfer-Encoding: chunked
< Server: Racket
< Last-Modified: Sun, 12 Jun 2022 21:00:08 GMT
< Date: Sun, 12 Jun 2022 21:00:08 GMT
< Content-Type: text/html; charset=utf-8
<
* Connection #0 to host localhost left intact
<html><head><title>Example</title></head><body><p>(url #f #f #f #f #t (list (path/param "foo" '())) '() #f)</p></body></html>⏎

$ curl http://localhost:8000/bar --verbose
*   Trying 127.0.0.1:8000...
* Connected to localhost (127.0.0.1) port 8000 (#0)
> GET /bar HTTP/1.1
> Host: localhost:8000
> User-Agent: curl/7.79.1
> Accept: */*
>
* Mark bundle as not supporting multiuse
< HTTP/1.1 200 hello: #(struct:url #f #f #f #f #t (#(struct:path/param "bar" ())) () #f)
< Transfer-Encoding: chunked
< Server: Racket
< Last-Modified: Sun, 12 Jun 2022 21:00:12 GMT
< Date: Sun, 12 Jun 2022 21:00:12 GMT
< Content-Type: text/html; charset=utf-8
<
* Connection #0 to host localhost left intact
<html><head><title>Example</title></head><body><p>(url #f #f #f #f #t (list (path/param "foo" '())) '() #f)</p></body></html>⏎

Note that both "struct:path/param"s in the response headers are correct. The bodies are of course incorrect, as you indicated in the mailing list.

sorawee commented 2 years ago

This is now ready for review.

jeapostrophe commented 2 years ago

Thank you!