willnorris / imageproxy

A caching, resizing image proxy written in Go
Apache License 2.0
3.53k stars 481 forks source link

Memory Leak with invalid status code #316

Open Jorgevillada opened 2 years ago

Jorgevillada commented 2 years ago

Hi, thank you for this project.

I been testing this project in kubernetes with 100/200 Req/sec and with invalid status code (> 299) there are some problems with the memory and a lot of OOMKilled in a day(10-15).

I tested this PR updated with the current version of the branch main(https://github.com/willnorris/imageproxy/pull/235) but it has other errors. (Probably channels closed)

error copying response: readfrom tcp 127.0.0.1:8080->127.0.0.1:53150: write tcp 127.0.0.1:8080->127.0.0.1:53150: write: broken pipe

This image is from prometheus with the current branch main. Memory incrase until OOMKilled happens image

I been trying some code modifications and this behavior changes when added this If (after this if https://github.com/willnorris/imageproxy/blob/main/imageproxy.go#L484-L487)

if should304(req, resp) {
        // bare 304 response, full response will be used from cache
        return &http.Response{StatusCode: http.StatusNotModified}, nil
}
if resp.StatusCode > 299 {
    return nil, fmt.Errorf("invalid http code %s: %s", req.URL.String(), resp.Status)
}

and this. https://github.com/willnorris/imageproxy/blob/main/imageproxy.go#L207 i moved it before this if.

resp, err := p.Client.Do(actualReq)
// close the original resp.Body, even if we wrap it in a NopCloser below
if resp != nil && resp.Body != nil {
  defer resp.Body.Close()
}

if err != nil {
  msg := fmt.Sprintf("error fetching remote image: %v", err)
  p.log(msg)
  http.Error(w, msg, http.StatusInternalServerError)
  metricRemoteErrors.Inc()
  return
}

I'm no sure if this status code validation works in all escenarios, but for me it works. This is prometheus with this change(1 hour before vs 9 hour now) image

The memory problem is still present, but the behavior is different.

I added "net/http/pprof" to try to find aditional strange behaviors. and I found this. image

It seems like a problem with cached request too.

I found this open PR https://github.com/gregjones/httpcache/pull/102/files

This is from pprof: Heap: https://drive.google.com/file/d/1lH9oYfO_88EAAMdxEwZV1FrcNS-3CaN-/view?usp=sharing goroutine: https://drive.google.com/file/d/1_jinUorbPVNLxEMrUbzXs5Z9ITT5_2MD/view?usp=sharing

Relates to: https://github.com/willnorris/imageproxy/issues/97 https://github.com/willnorris/imageproxy/issues/196 https://github.com/willnorris/imageproxy/issues/92

gardner commented 2 years ago

nginx can rate limit request:

limit_req_zone global zone=img:1m rate=2r/s;
server {
    listen       80;
    server_name example.com;

    location ^~ /img/ {
        limit_req zone=img burst=20;
        proxy_pass http://img:8080/;
    }
netik commented 2 years ago

Rate limiting isn't the answer when there are actual memory leaks to contend with.

I'm testing these patches on my install now.

gardner commented 2 years ago

This issue ultimately prevented me from using the software. I ended up converting everything to webp in a preprocessing batch job.

LoopControl commented 1 year ago

I'm running into this issue now with increased traffic -- every request is getting timed out with messages like:

error copying response: readfrom tcp 127.0.0.1:8080->127.0.0.1:39936: write tcp 127.0.0.1:8080->127.0.0.1:39936: write: broken pipe