unoconv / unoserver

MIT License
596 stars 81 forks source link

Parallel Unoserver with different ports hangs/gets stuck #16

Open jerrylshen opened 2 years ago

jerrylshen commented 2 years ago

Hi, not sure if appropriate to post here or not, please close/remove if so.

I'm trying to parallelize the conversion of files to PDF using unoserver based on this on the bottom of the page: You should be able to on a multi-core machine run several unoservers with different ports. There is however no support for any form of load balancing in unoserver, you would have to implement that yourself in your usage of unoconverter.

What I did:
I'm using joblib to parallelize using cores and assigning each instance an id value. Each instance does an os.system("unoserver --port " + str(port)) I see this as the output but it hangs there and doesn't proceed to the rest of the code where it'd start the actual conversion.

[Parallel(n_jobs=2)]: Using backend LokyBackend with 2 concurrent workers. INFO:unoserver:Starting unoserver. INFO:unoserver:Command: libreoffice --headless --invisible --nocrashreport --nodefault --nologo --nofirststartwizard --norestore -env:UserInstallation=file:///tmp/tmp35cy3xi9 --accept=socket,host=127.0.0.1,port=2000,tcpNoDelay=1;urp;StarOffice.ComponentContext INFO:unoserver:Starting unoserver. INFO:unoserver:Command: libreoffice --headless --invisible --nocrashreport --nodefault --nologo --nofirststartwizard --norestore -env:UserInstallation=file:///tmp/tmp9n0aw9aj --accept=socket,host=127.0.0.1,port=2001,tcpNoDelay=1;urp;StarOffice.ComponentContext

^You can see I'm using different ports - 2000 and 2001

How to go about parallelizing unoserver conversions?

dev environment: AWS Ubuntu instance, Python 3

Thanks

regebro commented 2 years ago

This is the correct place, and that should work in theory, although I have never tried joblib. I hope to get time to look at the unoserver backlog soon.

The output above looks correct. What happens when you try to start a conversion with unoconvert?

jerrylshen commented 2 years ago

Hi @regebro

I made some progress but came across another roadblock. So, I realized I have to have the unoservers started in a separate process, so I have my unoservers in a separate file and ran it prior.

In my pdf conversion file, I try to connect via this command: command = "unoconvert --convert-to pdf --interface {} --port {} \"{}\" \"{}\"".format("127.0.0.1", str(port), input_path, output_path)

os.system(command)

This is my error output: unoserver.converter.NoConnectException: Connector : couldn't connect to socket (Connection refused)

This is all done on separate cores.

Any advice? Thanks! Edit: I've tried ports "0" and "1" for a multicore testing for both the servers and the pdf conversion client

regebro commented 2 years ago

That should work.

jerrylshen commented 2 years ago

Welp, it doesn't work though:/ But thanks for giving a look over to see if I'm missing anything too obvious/making sure I'm on the right track, I'll continue to debug this

jannek-aalto commented 1 year ago

If this is still an issue, you're probaly hitting what I did - each conversion server instance now uses two tcp ports. Add --uno-port $((port+100)) or something. (See issue #29...)

regebro commented 6 months ago

Good point, but that change happened in October 2023, so that's not the original problem. :-D