miekg / caddy-prometheus

Prometheus metrics middleware for caddy
Apache License 2.0
65 stars 26 forks source link

Setting status code to rw.Status() side effects other plugins #27

Closed jaredririe closed 7 years ago

jaredririe commented 7 years ago

I tracked down a bug that is caused by #25, which introduced this change:

status, err := next.ServeHTTP(rw, r)
if status == 0 {
    status = rw.Status()
}

Setting the status code can cause unexpected behavior because other plugins interpret status code 0 as an already-written response which is expected when using the proxy plugin.

This bug was exposed when a Java consumer of Caddy with Prometheus enabled threw this exception: org.springframework.web.client.ResourceAccessException: I/O error on PUT request for <url>: <url> failed to respond.

Changing the above to this snippet resolves the issue as it never sets status. stat is used later in responseStatus.WithLabelValues.

status, err := next.ServeHTTP(rw, r)
// proxies return a status code of 0 but the actual status is available on rw
var stat int
if status == 0 {
    stat = rw.Status()
}
miekg commented 7 years ago

Ah that makes sense! Need to double check this because the history for these lines of code is interesting.

On 21 Jun 2017 5:53 pm, "Jared Ririe" notifications@github.com wrote:

I tracked down a bug that is caused by #25 https://github.com/miekg/caddy-prometheus/pull/25, which introduced this change:

status, err := next.ServeHTTP(rw, r) if status == 0 { status = rw.Status() }

Setting the status code can cause unexpected behavior because other plugins interpret status code 0 as an already-written response which is expected when using the proxy plugin.

This bug was exposed when a Java consumer of Caddy with Prometheus enabled threw this exception: org.springframework.web.client.ResourceAccessException: I/O error on PUT request for : failed to respond.

Changing the above to this snippet resolves the issue as it never sets status. stat is used later in responseStatus.WithLabelValues.

status, err := next.ServeHTTP(rw, r) // proxies return a status code of 0 but the actual status is available on rw var stat int if status == 0 { stat = rw.Status() }

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/miekg/caddy-prometheus/issues/27, or mute the thread https://github.com/notifications/unsubscribe-auth/AAVkW2WwsvbBaHbCIwQpGnhqtHmf8bv1ks5sGUqfgaJpZM4OBPcR .

jaredririe commented 7 years ago

I agree that this is worth careful consideration. Handling the status code properly has been one of the harder challenges I've encountered while writing custom Caddy plugins.

I've gleaned most of my understanding of status code handling from the official errors, log, and proxy plugins.

caddy/caddyhttp/errors/errors.go

// returning 0 signals that a response has been written
...
if status >= 400 {
    h.errorPage(w, r, status)
    return 0, err
}

caddy/caddyhttp/log/log.go

if status >= 400 {
    // There was an error up the chain, but no response has been written yet.
    // The error must be handled here so the log entry will record the response size.
    ...
}

caddy/caddyhttp/proxy/proxy.go

// if no errors, we're done here
if backendErr == nil {
    return 0, nil
}

if backendErr == httpserver.MaxBytesExceededErr {
    return http.StatusRequestEntityTooLarge, backendErr
}

Keeping the above snippets in mind, here's an updated suggestion for properly capturing the status code in the Prometheus plugin. This seems to work with my barrage of tests.

// Record response to get status code and size of the reply.
rw := httpserver.NewResponseRecorder(w)
status, err := next.ServeHTTP(rw, r)

var stat int
if err != nil {
    if status == 0 {
        // some middlewares set the status to 0, but return an non nil error: map these to status 500
        stat = 500
    } else {
        // if the proxy encounters an error, it returns the appropriate status code (such as 502)
        stat = status
    }
} else if status == 0 {
    // the proxy returns a status code of 0, but the actual status is available on rw
    stat = rw.Status()
}

One loose end is that I don't like how it's possible for stat to remain unmapped at 0 (when err == nil && status != 0), but I don't think this case comes up in practice and perhaps 0 is the best mapping if it does.

miekg commented 7 years ago

/cc @achernyak

hackeryarn commented 7 years ago

@jaredririe I think in the case of err == nil && status != 0 it would be fine to map status directly to stat. This would allow to always cleanly pull out the stat variable.

Also, feel free to create a PR with your updates. It looks like you've done all the leg work on this, and it checks out on my test also.

miekg commented 7 years ago

Can either with of you wrap this in a PR? Cause LGTM - happy to publish a update to caddy after the merge.

jaredririe commented 7 years ago

Yes, I'll make a PR today.