memory consumption - Githubissues

xf0e commented 5 years ago

Hey tleyden!

I'm looking into memory consumption by both the worker and the http daemon. And in can be very high. Starting with a low memory profile just under 1MB it grows with every request. On my test the memory consumption grew to over 2GB for 10 requests with TIFF image about 30Mb of size. I tried to debug a little bit, but lacking the experience I only found out - the most memory is allocated on the heap by make.Slice. This example is showing consumption after one Request with 30Mb payload total.

File: cli_httpd
Type: inuse_space
Time: Dec 14, 2018 at 3:32pm (CET)
Entering interactive mode (type "help" for commands, "o" for options)
(pprof) top20
Showing nodes accounting for 83.50MB, 100% of 83.50MB total
Showing top 20 nodes out of 25
      flat  flat%   sum%        cum   cum%
      83MB 99.40% 99.40%       83MB 99.40%  bytes.makeSlice
    0.50MB   0.6%   100%     0.50MB   0.6%  net/http.init.2
         0     0%   100%       32MB 38.32%  bytes.(*Buffer).ReadFrom
         0     0%   100%       51MB 61.08%  bytes.(*Buffer).Write
         0     0%   100%       83MB 99.40%  bytes.(*Buffer).grow
         0     0%   100%       51MB 61.08%  encoding/base64.(*encoder).Write
         0     0%   100%       51MB 61.08%  encoding/json.(*encodeState).marshal
         0     0%   100%       51MB 61.08%  encoding/json.(*encodeState).reflectValue
         0     0%   100%       51MB 61.08%  encoding/json.(*structEncoder).encode
         0     0%   100%       51MB 61.08%  encoding/json.(*structEncoder).encode-fm
         0     0%   100%       51MB 61.08%  encoding/json.Marshal
         0     0%   100%       51MB 61.08%  encoding/json.encodeByteSlice
         0     0%   100%       83MB 99.40%  github.com/xf0e/open-ocr.(*OcrHttpHandler).ServeHTTP
         0     0%   100%       32MB 38.32%  github.com/xf0e/open-ocr.(*OcrRequest).downloadImgUrl
         0     0%   100%       83MB 99.40%  github.com/xf0e/open-ocr.(*OcrRpcClient).DecodeImage
         0     0%   100%       83MB 99.40%  github.com/xf0e/open-ocr.HandleOcrRequest
         0     0%   100%       32MB 38.32%  github.com/xf0e/open-ocr.url2bytes
         0     0%   100%       32MB 38.32%  io/ioutil.ReadAll
         0     0%   100%       32MB 38.32%  io/ioutil.readAll
         0     0%   100%     0.50MB   0.6%  main.init
(pprof) list makeSlice
Total: 83.50MB
ROUTINE ======================== bytes.makeSlice in /usr/lib/go/src/bytes/buffer.go
      83MB       83MB (flat, cum) 99.40% of Total
         .          .    226:   defer func() {
         .          .    227:           if recover() != nil {
         .          .    228:                   panic(ErrTooLarge)
         .          .    229:           }
         .          .    230:   }()
      83MB       83MB    231:   return make([]byte, n)
         .          .    232:}
         .          .    233:
         .          .    234:// WriteTo writes data to w until the buffer is drained or an error occurs.
         .          .    235:// The return value n is the number of bytes written; it always fits into an
         .          .    236:// int, but it is int64 to match the io.WriterTo interface. Any error
(pprof) list DecodeMessage
Total: 83.50MB

I'm trying to understand where additional 50 Mb came from, but i can't figure it out. I assume the memory load comes from storing the payload as []byte by the function url2bytes.

Is it possible to rewrite the function to store the payload on a file system and passing over rabbitmq the path to the file instead of the whole payload?

The memory consumption by the worker is even bigger. I think for the same reason, but i can't prove it because I was not able to get the pprof data for this part of open-ocr.

I would try to rewrite the code. I just need to know which parts are involved(I don't understand if the rabbitmq part will be involved).

Are you interested in elaborating this problem together and provide me some hints?

Regards!

tleyden commented 5 years ago

Is it possible to rewrite the function to store the payload on a file system and passing over rabbitmq the path to the file instead of the whole payload?

I don't see how that will work since typically the processes will be running on different machines. It would only work if there was a shared/common filesystem, which won't be on every system.

tleyden commented 5 years ago

On my test the memory consumption grew to over 2GB for 10 requests with TIFF image about 30Mb of size.

Can you give details on how you are determining the memory usage?

Also, are you sure these requests are being serviced? Eg, they aren't just getting stuck into rabbitmq without any workers to process? In that case, I'd expect the memory to increase in the way you observed.

I did a quick pass over the code and no obvious memory leaks jumped out at me.

xf0e commented 5 years ago

Hm, you're right, i forgot to take this into account... Need to rethink the idea, there must be a solution for this.

Can you help me to find there(in the case above) additional 50Mb come from? This alone would drop memory consumption by 2/3. I tried to investigate further and found out, that memory is returned to the OS after 8-10 Minutes upon OCR text delivery to the client.

xf0e commented 5 years ago

Sorry, i accidentally closed the issue. It was not intended.

My testing procedure:

Started cli_worker and the cli_httpd.
Pushed x requests to cli_httpd by issuing curl -X POST -H "Content-Type: application/json" -d '{"img_url":"http://localhost:8000/testimage2.tif","engine":"tesseract"}' http://localhost:8080/ocr
Watched the RES(RSS) memory values for both processes and took the heap pprof curl http://localhost:8080/debug/pprof/heap > heap.pproffor cli_httpd(i posted the example above. The values there conform.

All request got valid reply from open-ocr and the memory usage went down to just 40Mb and 9Mb after ca. 10 Minutes upon delivery.

tleyden commented 5 years ago

All request got valid reply from open-ocr and the memory usage went down to just 40Mb and 9Mb after ca. 10 Minutes upon delivery.

So it sounds like you are saying the memory is eventually being reclaimed to the OS? Are you just trying to avoid the memory spikes from happening at all? Or are you seeing what appears to be a leak where it just keeps increasing. You mentioned you saw it grow to 2 GB.

tleyden commented 5 years ago

The gctrace section here might shed some more light on it: https://dave.cheney.net/tag/godebug

xf0e commented 5 years ago

Where are two problems with memory footprint.

First one are the spikes special in cli_httpd. The memory area is being held by cli_http is always factor 3 bigger than the actual payload. I don't know(i hope) if it can be reduced. The good news is, it will be reclaimed by OS at some point, but it could be too late if you got enough workers with one httpd. In my tests, 10 requests with 30Mb each are claiming 700-800MB RAM. The memory is freed after ca. 8-10 Minutes after last request is delivered. I'm planning to have 40-50 workers.
Second problem is the memory leak in cli_worker. The memory is reclaimed by the OS only if the ocr requests are delivered. For test purpose I have reduced the timeout delivery an fired up 10x requests at one worker. Only 5 request were delivered, the last 5 all got "Unable to perform OCR decode. Error: Timeout waiting for RPC response". Although the tesseract was running for each of them properly. The memory demand was about 980Mb for this particular case. After some time only the half of the memory was released to the OS. The non successful requests(with timeout) are still holding the memory.

You can see how the memory footprint looks over the time(thanks for gctrace link!). The memory used by cli_worker won't be released completely.

cli_httpd cli_worker

Thanks!

tleyden commented 5 years ago

The non successful requests(with timeout) are still holding the memory.

I think the problem is here:

https://github.com/tleyden/open-ocr/blob/4fd4a2ac2a09c30684729d9c41d145ebafa5d4cc/ocr_rpc_client.go#L215

If it times out waiting for a worker response, then nothing will be listening on that rpcResponseChan and so rpcResponseChan <- ocrResult will block forever.

Do you see "send result to rpcResponseChan" messages without corresponding "sent result to rpcResponseChan" messages?

Alternatively, do you see extra goroutines stuck there? (here are some ways to dump the goroutine stacks)

xf0e commented 5 years ago

This is the output. Last succeeded ocr request and the rest what didn't make in time:

22:57:28.143282 OCR_CLIENT: got 20356B delivery: [1] "Lorem ipsum" 22:57:28.143302 OCR_CLIENT: send result to rpcResponseChan 22:57:28.143304 OCR_CLIENT: sent result to rpcResponseChan 22:57:28.144554 OCR_HTTP: ocrResult: {Lorem Ipsum} 22:57:34.550229 ERROR: Timeout waiting for RPC response -- open-ocr.HandleOcrRequest() at ocr_http_handler.go:80 22:57:34.550242 ERROR: Unable to perform OCR decode. Error: Timeout waiting for RPC response -- open-ocr.(OcrHttpHandler).ServeHTTP() at ocr_http_handler.go:40 22:57:36.157883 ERROR: Timeout waiting for RPC response -- open-ocr.HandleOcrRequest() at ocr_http_handler.go:80 22:57:36.157894 ERROR: Unable to perform OCR decode. Error: Timeout waiting for RPC response -- open-ocr.(OcrHttpHandler).ServeHTTP() at ocr_http_handler.go:40 22:57:37.794640 ERROR: Timeout waiting for RPC response -- open-ocr.HandleOcrRequest() at ocr_http_handler.go:80 22:57:37.794651 ERROR: Unable to perform OCR decode. Error: Timeout waiting for RPC response -- open-ocr.(OcrHttpHandler).ServeHTTP() at ocr_http_handler.go:40 22:57:39.532304 ERROR: Timeout waiting for RPC response -- open-ocr.HandleOcrRequest() at ocr_http_handler.go:80 22:57:39.532317 ERROR: Unable to perform OCR decode. Error: Timeout waiting for RPC response -- open-ocr.(OcrHttpHandler).ServeHTTP() at ocr_http_handler.go:40

There is no "send result to rpcResponseChan" at all.

tleyden commented 5 years ago

I wonder if the memory leak would be reproducible in a simpler case -- what if there is no worker running? Do you see the memory just keep increasing as you add jobs that are never processed?

xf0e commented 5 years ago

There are minimum 2 cases with leaking memory. Both could be reproduced really easy.

The first case:

what if there is no worker running?

only cli_httpd and rabbitmq running:

Then the all requests timeouts, the cli_httpd will release the memory at some point:

21:26:27.431007 OCR_HTTP: Starting listener on :8080 21:26:58.560339 OCR_HTTP: serveHttp called 21:26:58.561176 OCR_CLIENT: dialing "amqp://guest:guest@localhost:5672/" 21:26:58.569238 OCR_CLIENT: callbackQueue name: amq.gen-ZpRyOndmB4PozCcbS6ZOGg 21:26:58.569582 OCR_CLIENT: looping over deliveries.. 21:26:58.648990 OCR_CLIENT: ocrRequest before: ImgUrl: , EngineType: ENGINE_TESSERACT, Preprocessors: [] 21:26:58.649003 OCR_CLIENT: publishing with routing key "decode-ocr" 21:26:58.649008 OCR_CLIENT: ocrRequest after: ImgUrl: , EngineType: ENGINE_TESSERACT, Preprocessors: [] 21:27:50.653017 OCR_HTTP: serveHttp called 21:27:50.653074 OCR_CLIENT: dialing "amqp://guest:guest@localhost:5672/" 21:27:50.657004 OCR_CLIENT: callbackQueue name: amq.gen-gfWBqIeFQ6RMFHzS0KoSXA 21:27:50.657382 OCR_CLIENT: looping over deliveries.. 21:27:50.684407 OCR_CLIENT: ocrRequest before: ImgUrl: , EngineType: ENGINE_TESSERACT, Preprocessors: [] 21:27:50.684417 OCR_CLIENT: publishing with routing key "decode-ocr" 21:27:50.684431 OCR_CLIENT: ocrRequest after: ImgUrl: , EngineType: ENGINE_TESSERACT, Preprocessors: [] 21:28:01.923367 OCR_HTTP: serveHttp called 21:28:01.923432 OCR_CLIENT: dialing "amqp://guest:guest@localhost:5672/" 21:28:01.928467 OCR_CLIENT: callbackQueue name: amq.gen-lIBccn0D6zWnrXaBaedItw 21:28:01.928843 OCR_CLIENT: looping over deliveries.. 21:28:01.966831 OCR_CLIENT: ocrRequest before: ImgUrl: , EngineType: ENGINE_TESSERACT, Preprocessors: [] 21:28:01.966842 OCR_CLIENT: publishing with routing key "decode-ocr" 21:28:01.966846 OCR_CLIENT: ocrRequest after: ImgUrl: , EngineType: ENGINE_TESSERACT, Preprocessors: [] 21:28:58.778786 ERROR: Timeout waiting for RPC response -- open-ocr.HandleOcrRequest() at ocr_http_handler.go:80 21:28:58.778826 ERROR: Unable to perform OCR decode. Error: Timeout waiting for RPC response -- open-ocr.(*OcrHttpHandler).ServeHTTP() at ocr_http_handler.go:40 21:29:50.780741 ERROR: Timeout waiting for RPC response -- open-ocr.HandleOcrRequest() at ocr_http_handler.go:80 21:29:50.780766 ERROR: Unable to perform OCR decode. Error: Timeout waiting for RPC response -- open-ocr.(*OcrHttpHandler).ServeHTTP() at ocr_http_handler.go:40 GC forced 21:30:02.062150 ERROR: Timeout waiting for RPC response -- open-ocr.HandleOcrRequest() at ocr_http_handler.go:80 21:30:02.062169 ERROR: Unable to perform OCR decode. Error: Timeout waiting for RPC response -- open-ocr.(*OcrHttpHandler).ServeHTTP() at ocr_http_handler.go:40 GC forced GC forced scvg2: 0 MB released GC forced GC forced scvg3: 382 MB released

The rabbitmq will preserve the messages:

What we are missing at this point is deleting corresponding messages from the rabbitmq queue.

Then we start cli_worker now it immediately gets all messages from the queue an performs ocr on it. Which is bad, since no one will ever get them. But there is no memory leakage in this case

21:43:44.248453 OCR_WORKER: Creating new OCR Worker 21:43:44.248482 OCR_WORKER: Run() called... 21:43:44.248496 OCR_WORKER: dialing "amqp://guest:guest@localhost:5672/" 21:43:44.253650 OCR_WORKER: got Connection, getting Channel 21:43:44.254781 OCR_WORKER: binding to: decode-ocr 21:43:44.255152 OCR_WORKER: Queue bound to Exchange, starting Consume (consumer tag "foo") 21:43:44.354363 OCR_WORKER: got 40648128 byte delivery: [1]. Routing key: decode-ocr Reply to: amq.gen-ZpRyOndmB4PozCcbS6ZOGg 21:43:44.841885 OCR_TESSERACT: Use tesseract with bytes image 21:43:44.851449 OCR_TESSERACT: cmdArgs: [/tmp/23e0f1dd-85ca-4a31-772a-91960491d9db /tmp/23e0f1dd-85ca-4a31-772a-91960491d9db] 21:44:09.915722 OCR_TESSERACT: checking if exists: /tmp/23e0f1dd-85ca-4a31-772a-91960491d9db.txt 21:44:09.917860 OCR_WORKER: Sending rpc response: {Lorem Ipsum} 21:44:09.917870 OCR_WORKER: sendRpcResponse to: amq.gen-ZpRyOndmB4PozCcbS6ZOGg 21:44:09.917975 OCR_WORKER: sendRpcResponse succeeded 21:44:09.917982 OCR_WORKER: got 40648128 byte delivery: [2]. Routing key: decode-ocr Reply to: amq.gen-gfWBqIeFQ6RMFHzS0KoSXA 21:44:10.349823 OCR_TESSERACT: Use tesseract with bytes image 21:44:10.359666 OCR_TESSERACT: cmdArgs: [/tmp/8752631a-1be0-4905-74de-8f262e405049 /tmp/8752631a-1be0-4905-74de-8f262e405049] 21:44:36.289734 OCR_TESSERACT: checking if exists: /tmp/8752631a-1be0-4905-74de-8f262e405049.txt 21:44:36.292069 OCR_WORKER: Sending rpc response: {Lorem Ipsum} 21:44:36.292081 OCR_WORKER: sendRpcResponse to: amq.gen-gfWBqIeFQ6RMFHzS0KoSXA 21:44:36.293107 OCR_WORKER: sendRpcResponse succeeded GC forced GC forced scvg1: 0 MB released GC forced scvg2: 231 MB released GC forced

The second case

Is described before this post. We have cli_httpd, cli_worker and rabbitmq running. Now we are pushing just enough requests into the queue there at least one won't make in time. If we got enough RAM on cli_httpd and rabbitmq machines we won't crash here. Now the workers kicks in... All request which timeouts will not release the memory which belonging to such requests. Actually really ease to reproduce:

start one cli_httpd, one cli_worker and the rabbitmq. Standard timeout if 120 Seconds in the sources.
now on one tty or more just call curl -X POST -H "Content-Type: application/json" -d '{"img_url":"http://localhost:8000/testimage.tif","engine":"tesseract"}' http://localhost:8080/ocr X times. There X is depends on the test image. In my case it is a 30Mb image with 6 pages on it. The 4th or 5th and on wards requests(on my test machine) will receive a timeout and cli_worker will leak memory for such requests.

xf0e commented 5 years ago

I got a workaround for leaking memory by cli_worker, but I think we should address the actual issue, since this workaround would not help if the request which is being processed times out.

w.channel.Qos(1, 0, true) in orc_rpc_worker.go in Run() function will prevent worker for getting more than one message at a time from the queue.

tleyden commented 5 years ago

Ok thanks for the explanations and the steps to repro.

tleyden / open-ocr

memory consumption #109

The first case:

The second case