Closed adamantal closed 2 months ago
Python memory is generally good and we don't have any variables that could accumulate data, so I doubt it's unoserver, but rather in Libreoffice.
On Sat, Aug 24, 2024, 14:31 Adam Antal @.***> wrote:
Hey!
First of all, thank you guys for this great project! We've just recently started using it, and it just fits right to our use cases 👌 Description
Converting multiple documents over time accumulates memory and eventually results in an out of memory error. Context
We hit #108 https://github.com/unoconv/unoserver/issues/108, and tried to tackle the issue, but did not have success. We're unoserver in Kubernetes, so we've added a cloud-native solution: added a liveness probe of checking whether the unoserver is able to convert a very minimalistic, 1-page PDF document to PNG. This conversion usually finishes <1s, so it's ideal to check it periodically and restart the server if it does not respond within a given timeframe.
We experienced is that unoserver accumulates memory over time, because of these small, periodic conversions. See attached screenshot above showing the memory usage over time in Grafana: Screenshot.2024-08-24.at.14.21.50.png (view on web) https://github.com/user-attachments/assets/67ffa0d6-70a1-43c7-a4fd-bfea185c64c0 (note that there are some sudden jumps that are due to some conversions - we do a few dozens daily. as you can see, these are also kept in the memory) Steps to reproduce
I think a bash script like this would be able to reproduce this:
!/bin/bash
while true; do unoconvert \ --convert-to \ "pdf" \ --host \ $UNOSERVER_ADDRESS \ --port \ $UNOSERVER_PORT \ --host-location \ "local" \ $TEST_FILE \ $OUTPUT_FILE rm -f $OUTPUT_FILE sleep 10done
Other notes
I'm not super knowledgeable around the XMLRPC implementation in Python, but IMO it's unlikely that the memory leak is caused by the caller side (e.g. interrupting the unoconvert call).
After taking a look at the code, I could not effectively locate the root cause of this. My two suspicions are either due to some kind of caching or keeping the input in the memory at XMLRPC side or it could be caused by the libreoffice process directly. If it's the latter, we'll workaround it somewhat with a HA setup, but if it's the former, it might be a bug with unoserver itself, and you should be aware of that.
Also any advise or suggestion on the configuration is appreciated.
— Reply to this email directly, view it on GitHub https://github.com/unoconv/unoserver/issues/129, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAGIK5DYRHLMXGXOL7VTUJDZTB4LJAVCNFSM6AAAAABNBRZMF2VHI2DSMVQWIX3LMV43ASLTON2WKOZSGQ4DINJVGU2TSNA . You are receiving this because you are subscribed to this thread.Message ID: @.***>
Some extra info, if it's helpful. The libreoffice
subprocess doesn't seem to increase its memory consumption:
$ while true; do ps -p 28 -o %mem,rss,comm; sleep 10; done
%MEM RSS COMMAND
1.8 291672 soffice.bin
%MEM RSS COMMAND
1.8 291648 soffice.bin
%MEM RSS COMMAND
1.8 291648 soffice.bin
%MEM RSS COMMAND
1.8 291648 soffice.bin
%MEM RSS COMMAND
1.8 291636 soffice.bin
while the unoserver does:
$ while true; do ps -p 7 -o %mem,rss,comm; sleep 10; done
%MEM RSS COMMAND
2.1 336424 unoserver
%MEM RSS COMMAND
2.1 343600 unoserver
%MEM RSS COMMAND
2.1 343600 unoserver
%MEM RSS COMMAND
2.1 346236 unoserver
%MEM RSS COMMAND
2.1 348620 unoserver
%MEM RSS COMMAND
2.2 350068 unoserver
%MEM RSS COMMAND
2.2 350068 unoserver
%MEM RSS COMMAND
2.2 354836 unoserver
Throw a few heap allocation and performance tools, but only tracemalloc
were able to give me something valueable - the top 3 provided by tracemalloc.take_snapshot().statistics('lineno')
:
/usr/lib/python3/dist-packages/uno.py:507: size=34.0 MiB, count=782923, average=46 B
/usr/local/lib/python3.12/dist-packages/unoserver/converter.py:123: size=839 KiB, count=15346, average=56 B
/usr/local/lib/python3.12/dist-packages/unoserver/converter.py:125: size=776 KiB, count=15280, average=52 B
The first seems to be the issue there (I see that its memory is increasing upon each call). based on the source code however this seems to be a generic getter, so doesn't actually give us much. Probably a hanging reference to an uno object that is not picked up by the gc.
No update? I'm having the same kind of problem...
@adamantal What version of unoserver are you using?
Fixed in 2.2.2.
sorry, forget to get back to this thread. we're on 2.1 currently, but will update to 2.2.2 and get back to you to confirm that the memory leak is solved.
2.2.2 may have a stability problem. I can't confirm that it's because of this change, but we did have some crashes, which are handled better in the various betas that I released this week. So testing 3.0b2 might be better.
had some API incompatibility issues, so we've finally bumped to 2.2.2 - didn't experience much instability
can confirm that the memory leak is resolved. thank you very much 🙏
Hey!
First of all, thank you guys for this great project! We've just recently started using it, and it just fits right to our use cases 👌
Description
Converting multiple documents over time accumulates memory and eventually results in an out of memory error.
Context
We hit https://github.com/unoconv/unoserver/issues/108, and tried to tackle the issue, but did not have success. We're
unoserver
in Kubernetes, so we've added a cloud-native solution: added a liveness probe of checking whether theunoserver
is able to convert a very minimalistic, 1-page PDF document to PNG. This conversion usually finishes <1s, so it's ideal to check it periodically and restart the server if it does not respond within a given timeframe.We experienced is that
unoserver
accumulates memory over time, because of these small, periodic conversions. See attached screenshot above showing the memory usage over time in Grafana:(note that there are some sudden jumps that are due to some conversions - we do a few dozens daily. as you can see, these are also kept in the memory)
Steps to reproduce
I think a bash script like this would be able to reproduce this:
Other notes
I'm not super knowledgeable around the XMLRPC implementation in Python, but IMO it's unlikely that the memory leak is caused by the caller side (e.g. interrupting the
unoconvert
call).After taking a look at the code, I could not effectively locate the root cause of this. My two suspicions are either due to some kind of caching or keeping the input in the memory at XMLRPC side or it could be caused by the
libreoffice
process directly. If it's the latter, we'll workaround it somewhat with a HA setup, but if it's the former, it might be a bug withunoserver
itself, and you should be aware of that.Also any advise or suggestion on the configuration is appreciated.