webp-sh / webp_server_go

Go version of WebP Server. A tool that will serve your JPG/PNG/BMP/SVGs as WebP/AVIF format with compression, on-the-fly.
https://docs.webp.sh
GNU General Public License v3.0
1.79k stars 174 forks source link

Limiting the number of concurrent conversion tasks to reduce memory usage #75

Closed xlfjn closed 10 months ago

xlfjn commented 3 years ago

Is your feature request related to a problem? Please describe. The memory consumption of webp-server may grow to >500MB when processing 5 concurrent requests that require webp conversion. Such workload causes a great burden on conventional virtual servers (with 1GB ram) and may trigger the OOM killer.

Describe the solution you'd like Implement a limit on the number of concurrent webp conversion tasks.

Describe alternatives you've considered The memory usage problem can be partially mitigated with the following options:

  1. Limit the concurrent request numbers in HTTP servers. This is a common strategy to prevent the backend from overloading. However, it's ill-suited for webp-server, as conversion requests will block subsequent requests that can be directly fulfilled by returning the original image file.

  2. Convert every image file to webp in advance with -prefetch However, users still need to restart webp-server to perform pre-conversion every time they add some image files, or some concurrent requests might still cause excessive memory usage.

Both methods are not optimal enough for real-world usage, hence I think this problem should be addressed in webp-server itself.

BennyThink commented 3 years ago

It seems we're facing a memory leak issue, total 700MiB of RAM after batch conversion.

image

xlfjn commented 3 years ago

I also noticed such symptom when trying to determine the condition of an OOM event. I removed the files in EXHAUST_PATH, made a list of image URLs on my site and fed it to a HTTP server benchmarking tool (siege in my case).

After the benchmark completes, webp-server might still consume >150MB of RAM, if I set the concurrent requests count low enough to prevent it being killed by MemoryMax option of systemd / cgroup during the benchmark.

If, on the other hand, a set of already-converted URL is fed to webp-server, its memory usage quickly returns to <10MB after the benchmark. I don't have experience in golang and thought this might be a feature of go's GC.

BennyThink commented 3 years ago

From go prof

(pprof) top20 -cum
Showing nodes accounting for 649.90MB, 96.73% of 671.88MB total
Dropped 25 nodes (cum <= 3.36MB)
Showing top 20 nodes out of 24
      flat  flat%   sum%        cum   cum%
         0     0%     0%   671.38MB 99.93%  github.com/valyala/fasthttp.(*Server).serveConn
         0     0%     0%   671.38MB 99.93%  github.com/valyala/fasthttp.(*workerPool).getCh.func1
         0     0%     0%   671.38MB 99.93%  github.com/valyala/fasthttp.(*workerPool).workerFunc
         0     0%     0%   669.87MB 99.70%  github.com/gofiber/fiber/v2.(*App).handler
         0     0%     0%   669.37MB 99.63%  github.com/gofiber/fiber/v2.(*App).next
         0     0%     0%   669.37MB 99.63%  github.com/gofiber/fiber/v2.(*Ctx).Next
         0     0%     0%   669.37MB 99.63%  github.com/gofiber/fiber/v2/middleware/logger.New.func2
         0     0%     0%   669.37MB 99.63%  main.convert
         0     0%     0%   668.76MB 99.54%  main.webpEncoder
    1.01MB  0.15%  0.15%   582.47MB 86.69%  image/jpeg.Decode (inline)
         0     0%  0.15%   581.46MB 86.54%  image/jpeg.(*decoder).decode
  419.41MB 62.42% 62.57%   581.46MB 86.54%  image/jpeg.(*decoder).processSOS
  162.05MB 24.12% 86.69%   162.05MB 24.12%  image.NewYCbCr
         0     0% 86.69%   162.05MB 24.12%  image/jpeg.(*decoder).makeImg
         0     0% 86.69%    67.43MB 10.04%  github.com/chai2010/webp.Encode (inline)
   67.43MB 10.04% 96.73%    67.43MB 10.04%  github.com/chai2010/webp.NewRGBImage (inline)
         0     0% 96.73%    67.43MB 10.04%  github.com/chai2010/webp.NewRGBImageFrom
         0     0% 96.73%    67.43MB 10.04%  github.com/chai2010/webp.adjustImage
         0     0% 96.73%    67.43MB 10.04%  github.com/chai2010/webp.encode
         0     0% 96.73%    19.47MB  2.90%  bytes.(*Buffer).Grow (inline)
(pprof)
(pprof)
(pprof)
(pprof) top
Showing nodes accounting for 669.37MB, 99.63% of 671.88MB total
Dropped 25 nodes (cum <= 3.36MB)
Showing top 10 nodes out of 24
      flat  flat%   sum%        cum   cum%
  419.41MB 62.42% 62.42%   581.46MB 86.54%  image/jpeg.(*decoder).processSOS
  162.05MB 24.12% 86.54%   162.05MB 24.12%  image.NewYCbCr
   67.43MB 10.04% 96.58%    67.43MB 10.04%  github.com/chai2010/webp.NewRGBImage (inline)
   19.47MB  2.90% 99.48%    19.47MB  2.90%  bytes.makeSlice
    1.01MB  0.15% 99.63%   582.47MB 86.69%  image/jpeg.Decode
         0     0% 99.63%    19.47MB  2.90%  bytes.(*Buffer).Grow
         0     0% 99.63%    19.47MB  2.90%  bytes.(*Buffer).grow
         0     0% 99.63%    67.43MB 10.04%  github.com/chai2010/webp.Encode
         0     0% 99.63%    67.43MB 10.04%  github.com/chai2010/webp.NewRGBImageFrom
         0     0% 99.63%    67.43MB 10.04%  github.com/chai2010/webp.adjustImage

It seems image/jpeg.(*decoder).processSOS is taking up RAM. Nearly 62% in this test

n0vad3v commented 3 years ago

Yes, that's a bug.

I've made an attempt on VPS, if there are concurrent requests on the same file, there will be multiple conversions at once, which is RAM consuming, as the screenshot below:

Screenshot from 2021-03-02 21-06-07

There is a simple way to resolve it and I've made my attempt on https://github.com/webp-sh/webp_server_go/tree/convert_once_on_cocurrent_requests, by creating a lockfile before converting, this can block subsequent incoming requests, after a successful conversion, the following requests should just send the converted images.

Test as below: Screenshot from 2021-03-02 23-15-15

Command used in test: ab -c 200 -n 1000 -H "User-Agent: Chrome" http://127.0.0.1:3333/rms.jpg

xlfjn commented 3 years ago

Nice work! However I might have to clarify my usage of the word "concurrent requests": Intitially the problem is triggered by a page with about 50 medium-sized JPEG photos. The browser sends multiple requests targeting different URLs and that triggered the excessive memory usage. Still, if multiple concurrent requests to a single URL can trigger this issue, it should show that the cause lies in the conversion process?

n0vad3v commented 3 years ago

Yes, I understand you, and it's still WIP. When there are multiple requests on multiple unconverted images, there is still a potential memory leak, and currently there seems no way to avoid, there might be some ways to mitigate.

We've found that there might be some problems with our underlying library: chai2010/webp, as there are some similar issues on it, remains unsolved:

Currently the potential way to mitigate might be:

Example docker-compose.yml for creating a RAM limited docker container file as follows:

version: '3'

services:
  webp:
    image: webpsh/webps
    restart: always
    volumes:
      - ./path/to/pics:/opt/pics
      - ./path/to/exhaust:/opt/exhaust
    ports:
      -  127.0.0.1:3333:3333
    deploy:
      resources:
        limits:
          memory: 200M

Use docker-compose --compatibility up -d to start the container, now the container will be limited to specific ram only.

CONTAINER ID   NAME                                     CPU %     MEM USAGE / LIMIT     MEM %     NET I/O           BLOCK I/O         PIDS
a04ed041606b   test_webp_1                              0.05%     15.51MiB / 200MiB     7.76%     9.21kB / 90kB     213kB / 0B        5
n0vad3v commented 3 years ago

Added commit: https://github.com/webp-sh/webp_server_go/commit/774556f1fb557c0f24766e3ffa3fcc50f3b1b1aa

Added a new field MAX_JOB_COUNT in config.json to limit the max concurrent conversion.

Example config.json

{
  "HOST": "127.0.0.1",
  "PORT": "3333",
  "QUALITY": "80",
  "MAX_JOB_COUNT": "1",
  "IMG_PATH": "./pics",
  "EXHAUST_PATH": "./exhaust",
  "ALLOWED_TYPES": ["jpg","png","jpeg","bmp"]
}

Benchmark

Prepare the environment:

git checkout convert_once_on_cocurrent_requests
cd builds
./webp-server-linux-amd64 -dump-config > config.json
mkdir pics && cd pics
cp /path/to/image.jpg ./
for i in {1..20}; do cp image.jpg $i.jpg;done
cd ..
./webp-server-linux-amd64 -v

Bench script:

#!/bin/bash
curl -H "User-Agent: Chrome" http://127.0.0.1:3333/1.jpg && echo "done" &
curl -H "User-Agent: Chrome" http://127.0.0.1:3333/2.jpg && echo "done" &
curl -H "User-Agent: Chrome" http://127.0.0.1:3333/3.jpg && echo "done" &
curl -H "User-Agent: Chrome" http://127.0.0.1:3333/4.jpg && echo "done" &
curl -H "User-Agent: Chrome" http://127.0.0.1:3333/5.jpg && echo "done" &
curl -H "User-Agent: Chrome" http://127.0.0.1:3333/6.jpg && echo "done" &
curl -H "User-Agent: Chrome" http://127.0.0.1:3333/7.jpg && echo "done" &
curl -H "User-Agent: Chrome" http://127.0.0.1:3333/8.jpg && echo "done" &
curl -H "User-Agent: Chrome" http://127.0.0.1:3333/9.jpg && echo "done" &
curl -H "User-Agent: Chrome" http://127.0.0.1:3333/10.jpg && echo "done" &
curl -H "User-Agent: Chrome" http://127.0.0.1:3333/11.jpg && echo "done" &
curl -H "User-Agent: Chrome" http://127.0.0.1:3333/12.jpg && echo "done" &
curl -H "User-Agent: Chrome" http://127.0.0.1:3333/13.jpg && echo "done" &
curl -H "User-Agent: Chrome" http://127.0.0.1:3333/14.jpg && echo "done" &
curl -H "User-Agent: Chrome" http://127.0.0.1:3333/15.jpg && echo "done" &
curl -H "User-Agent: Chrome" http://127.0.0.1:3333/16.jpg && echo "done" &
curl -H "User-Agent: Chrome" http://127.0.0.1:3333/17.jpg && echo "done" &
curl -H "User-Agent: Chrome" http://127.0.0.1:3333/18.jpg && echo "done" &
curl -H "User-Agent: Chrome" http://127.0.0.1:3333/19.jpg && echo "done" &
curl -H "User-Agent: Chrome" http://127.0.0.1:3333/20.jpg && echo "done" &
wait

Before mitigation: Screenshot from 2021-03-04 13-36-00

After mitigation: Screenshot from 2021-03-04 13-35-13

xlfjn commented 3 years ago

I guess the culprit has already been pointed out in chai2010/webp#26. Maybe you can figure out a way to properly free the reference to the decoded image?

In the mean time I will probably set a limit on mem usage of this service, and configure HTTP server to retry by repeating the listening address of webp-server a few times in the load balancing configuration for reverse proxy.

n0vad3v commented 1 year ago

@xlfjn Hi~ We've switched to github.com/davidbyttow/govips for underlying WebP conversion since version 0.6.0, and seems adding MALLOC_ARENA_MAX=1 in env will greatly help reduce the RAM usage, could you please have a try on it?

Related issue: https://github.com/webp-sh/webp_server_go/issues/198

bugfest commented 1 year ago

Implemented by PR #226