Open gjmkoper opened 1 year ago
Hello,
Yeah...Unfortunately, the converted documents look so bad I told my parents to just convert to PDF before uploading.... The conversion from docx to PDF happens in this command using pandoc in printer/file_printer:
The problem is that pandoc isn't actually keeping the original document formatting. Unfortunately, I haven't had time to look for an alternative approach, although I have a hunch there are powerful command-line tools for doing the conversion. I haven't had much time to work on this project but if you know of any command-line tools I could use to convert formats, please let me know.
What about this one? https://www.npmjs.com/package/docx-pdf There are no usage reports but it is increasingly downloaded. I can try it out later this week.
Thanks but I think I'd really like to avoid requiring npm and node on top of the python stack, not to mention the need for additional scripts to invoke docx-pdf from the web application.
I ended up using LibreOffice. The conversion is not perfect but it preserves more of the original formatting. If you want to try it out and let me know what you think, do a git pull
and update your Python dependencies (pip install -r requirements
). You will also need to install a few additional packages to use the libreoffice
command (listed under Step 2 in README). If you're using a Raspberry Pi, you just run the interactive Django shell and run exec(open('initial_setup.py').read())
to install the libreoffice and default-jre packages.
Thanks for this contribution, it works great for pdf's. Unfortunately, the docx is converted to a funny document form I remember from using LaTeX long ago for my phd-thesis and after. Is there a way to adapt these - I believe default - settings?