unoconv / unoserver

MIT License
496 stars 69 forks source link

separate unoconvert so it can be run from user virtualenv without uno or system-site-packages #29

Open hottwaj opened 2 years ago

hottwaj commented 2 years ago

This would completely isolate unoconvert and make use of it in virtualenvs safer

unoconvert is already a remote process that doesn't in theory need access to uno directly (via import, etc), and using system-site-packages is a bit fragile - if your virtualenv is not setup correctly you might accidentally import a system package instead of one you thought you had in your virtualenv.

Thanks for great package!

jannek-aalto commented 2 years ago

There should be no need for libreoffice packages etc. at all on a unoconvert-client-only host. It's horrible to install (and update) the whole of libreoffice as a dependency for what's probably a very simple TCP thingy. I'm not fluent with Python, at all, unfortunately.

regebro commented 2 years ago

That's not possible, sorry. The communication is handled by the uno library that is a part of Libreoffice.

hottwaj commented 2 years ago

Hi there, it is possible and there are a few ways to do it.

One way (I did something similar recently) is to use python's xmlrpc to add an rpc server to converter.py. The server would need to wrap Converter.convert so that it can be called from another process via rpc.

An xmlrpc client would also be needed, but you only have one function that needs to be wrapped so would be quick to do

The xmlrpc client would be installable in virtualenvs. Users would need to take care of starting the server/converter before using it via the xmlrpc client.

See e.g. https://docs.python.org/3/library/xmlrpc.server.html#module-xmlrpc.server

hottwaj commented 2 years ago

I've added in #32 rough changes that would be needed to add the rpc server. The client will also be quite simple

regebro commented 2 years ago

Yes, sure we could build a completely different rpcserver and do this, but that's not what this module does, and you could do that on your side as well. The convert server in your case would still have the exact same problem, it needs to have access to the uno library. Yes, you would now get a "client client" that does not, but that doesn't really solve the issue itself. You still need to install unoserver so it has access to LibreOffice.

hottwaj commented 2 years ago

Thanks & yes agreed that this library would still need to be installed into system python env along with libreoffice+uno, there is no way around that.

The problem I was trying to solve is that other projects need to use this library, but at the moment they can only do so by either: i) running converter.py as a separate process (taking care to use system python not virtualenv python), or ii) installing the project into system python env. Both have their downsides especially the latter option you risk breaking your system python or being unable to satisfy the project's dependencies.

Also as an improvement to suggested structure, it would make more sense to run the rpc server in the same process that starts soffice (rpc server would still run from system python env). All the conversion code could be moved back into a "unoserver+rpc" module and the converter client could be very lightweight, using only RPC for conversion via both command line and API. This would also allow the converter to be installed in virtualenvs. Thanks!

regebro commented 2 years ago

It would make it possible to make a Python library to make conversions, so that's a benefit.

regebro commented 2 years ago

Having unoserver start a server, f ex with xml-rpc, and letting unoconvert use that instead is something we are fine with maintaining. It doesn't look like I will have time to implement it at the moment, but we are happy to accept contributions.

I would like to see the protocol used to somehow be versioned, so that we give an explicit error message if we end up using the wrong versions, and possibly even support different versions.

regebro commented 10 months ago

2.0a is out now with an XML-RPC server

jannek-aalto commented 10 months ago

2.0a is out now with an XML-RPC server

A great improvement! Will start testing ASAP and move into production rather quickly, I assume... Can't wait to drop libreoffice from application servers. Excellent, that the RPC-version already seems capable of shipping the payload too - we've been using NFS, but it provides the converter servers too much access (due to bad application design, which is out of our hands).

regebro commented 10 months ago

Headsup! I noticed the --daemon argument doesn't work in 2.0b1. If you need that I'll release a 2.0b2.

jannek-aalto commented 9 months ago

Daemons are kind of redundant these days, with systemd it's much better to run service processes in the foreground.

I managed to get the server bit running on a dedicated conversion server for a test (moodle) environment, ie. installed 2.0b1 over what was there previously and it still just magically works.

But then, I hit sort of a brick wall. Sorry for the stupid question, but how might one actually run the light client? Seems its a library at this stage, needing a bit of a wrapper script to work - this, unfortunately, is beyond my pretty much nonexistent python skills. Getting it to run with the RHEL8.8 system python3 3.6 would rock.

We now have a situation where transferring the data to convert and the resulting file via the XML-RPC mechanism (instead of NFS access currently used elsewhere) would be pure gold too...

regebro commented 9 months ago

The unoconvert client script acts as that wrapper, so you would still use it the same way, by starting unoserver, and then using unconvert to convert files.

jannek-aalto commented 9 months ago

Ah. Thanks for that golden tip. Running setup install with the system python and a custom prefix actually works, I did get the whole thing running and a remote conversion working. Had a bit of head-banging with haproxy on the conversion server, as I didn't first realize one now needs to specify --uno-port too... Didn't help that I use ports starting at 2001 for the workers! 2002.. doh. :D Just some final tweaks, packaging, distribution, testing, documentation and we're golden. Big thanks from Aalto University, we'll be very early adopters for 2.0.

regebro commented 9 months ago

Great stuff! Thanks for that, I'll release a 2.0 final shortly.

jannek-aalto commented 8 months ago

Happy to report that after some stress- and other testing we went into production (with 2.0b1) on our main Moodle instance, so now there are quite a lot of (end user) beta testers involved. :D

In case you'd like to add some docs/tips/examples, I've written a wrapper script which emulates unoconv 0.7, which Moodle directly supports - hopefully, can get rid of it via native support some day, but it works. Other than that, the most interesting bits are probably the converter service multi-instance systemd unit (ie. 'systemctl enable unoserver@2001 unoserver@2002') and maybe some haproxy config tips. (Perhaps the non-interactive installation scripts for everything too, but they're a bit too specific for our environment to share...)

[Unit]
Description=Unoserver document conversion service
Documentation=https://github.com/unoconv/unoserver
Wants=network-online.target
After=network-online.target
StartLimitIntervalSec=600
StartLimitBurst=5

[Service]
User=apache
Group=apache
ExecStart=/opt/bin/unoserver-wrapper %i
TimeoutStartSec=60
TimeoutStopSec=15
RestartSec=10s
Restart=always

[Install]
WantedBy=multi-user.target

/opt/bin/unoserver-wrapper:

#!/bin/sh
/opt/bin/unoserver --interface 0.0.0.0 --port $1 --uno-port $(($1+100))

haproxy config bits (from a keepalived+haproxy balancer pair, shared with other related use):

defaults
    mode                    tcp
    log                     global
    retries                 3
    timeout queue           45s
    timeout connect         5s
    timeout client          5m
    timeout server          5m
    timeout check           10s
    maxconn                 1000
    balance                 roundrobin
    log-format              %ci:%cp\ %fi:%fp\ %bi:%bp\ %si:%sp\ %b/%s\ %U/%B\ %t

listen unoserver_prod
  bind 1.2.3.4:2000 name unoserver
  default-server inter 20s fastinter 5s downinter 10s
  server unosrvp1_2001 1.2.3.5:2001 maxconn 1 check
  server unosrvp1_2002 1.2.3.5:2002 maxconn 1 check
  server unosrvp2_2001 1.2.3.6:2001 maxconn 1 check
  server unosrvp2_2002 1.2.3.6:2002 maxconn 1 check

(In production, we actually use 2 converter servers with 6 libreoffice 7.6.2's running on each.)

last but not least, the unoconv-remote translator script for Moodle (uses a static document formats list generated with unoconv 0.7):

#!/bin/ksh
typeset -i EC=0
CONFIGFILE=/opt/etc/unoconv-remote.env
[[ -s $CONFIGFILE ]] && . $CONFIGFILE

## config defaults
PYTHONPATH=/opt/unoserver-client/lib
DEBUG=${DEBUG:-0}
LOG=${LOG:-/var/log/unoconv-remote/unoconv-remote.log}
UNOCONVERT=${UNOCONVERT:-/opt/unoserver-client/bin/unoconvert}
UNOCVERS=${UNOCVERS:-unoconv 0.7}
FORMATS=${FORMATS:-$(cat /var/lib/unoconv-remote/formats.txt)}
SERVER=${SERVER:-1.2.3.4}
PORT=${PORT:-2000}

## globals
HOST=${HOST:-$(uname -n)}
STARTTS=$(date +%FT%T)
ENDTS=$STARTTS
MISCHEAD="============== $HOST"
CONVHEAD="++++++++++++++ $HOST"
ENDCHEAD="-------------- $HOST"

export PYTHONPATH

function logmsg {
  echo "$@" >>$LOG 2>&1
}

function logparams {
  typeset -i _i=1
  for _p in "$@"; do
    print "\$$_i: '$_p'" >>$LOG 2>&1
    _i+=1
  done
}

case $1 in
  --version)
    logmsg "$MISCHEAD $STARTTS, version check"
    (( $DEBUG > 0 )) && logparams $@
    print "$UNOCVERS"
    ;;
  --show)
    logmsg "$MISCHEAD $STARTTS, show formats"
    (( $DEBUG > 0 )) && logparams $@
    print "$FORMATS" 1>&2
    ;;
  -f)
    logmsg "$CONVHEAD $STARTTS, conversion start"
    (( $DEBUG > 0 )) && logparams $@
    logmsg "Input:  '$5'"
    logmsg "Output: '$4' [$2] via $SERVER:$PORT"
    $UNOCONVERT \
      --host "$SERVER" \
      --port "$PORT" \
      --convert-to "$2" \
      "$5" - \
      2>>"$LOG" > "$4"
    EC=$?
    ENDTS=$(date +%FT%T)
    if (( EC )) && [[ -f "$4" ]]; then
      logmsg "WARNING - $ENDTS: non-zero exit for '$5', removing destination file '$4'"
      rm "$4"
    fi
    logmsg "$ENDCHEAD $ENDTS, conversion end in $SECONDS s, exit code $EC"
    ;;
  *)
    logmsg "$MISCHEAD $STARTTS, unknown 1st option"
    logparams $@
    ;;
esac

exit $EC
regebro commented 8 months ago

Glad to hear it!

It would be good to have some place to store configuration tips.