unoconv / unoserver

MIT License
552 stars 77 forks source link

Can't convert doc to html #5

Closed zhongjin616 closed 1 year ago

zhongjin616 commented 2 years ago

hi, I test unoconvert with libreoffice7.1.7

unoconvert 97html转换文档.doc 977.html

cause an error:

INFO:unoserver:Starting unoconverter.
INFO:unoserver:Opening 97html转换文档.doc
Traceback (most recent call last):
  File "/root/miniconda3/envs/docvert/bin/unoconvert", line 9, in <module>
    sys.exit(main())
  File "/root/miniconda3/envs/docvert/lib/python3.8/site-packages/unoserver/converter.py", line 246, in main
    result = converter.convert(
  File "/root/miniconda3/envs/docvert/lib/python3.8/site-packages/unoserver/converter.py", line 186, in convert
    raise RuntimeError(
RuntimeError: Could not find an export filter from com.sun.star.text.TextDocument to graphic_HTML

but i can do libreoffice --headless --convert-to html html转换文档.doc succeed.

regebro commented 2 years ago

Could you share the document? It doesn't seem to be a generalproblem.

lublak commented 2 years ago

@regebro i have the same issue with a pptx file. Just create a simple pptx and use unoserver unoconvert somepresi.pptx somehtml.html. Currently i can't send this file because there are some personal datas (author, last edit user etc.) Could not find an export filter from com.sun.star.presentation.PresentationDocument to generic_HTML

lublak commented 2 years ago

I can force with: filtername = "impress_html_Export" But than i only get a single html file. It would be nice to support a folder export with a complete html structur. Or with embed images

regebro commented 2 years ago

@lublak Libreoffice sees presentations as a graphical format, and html as document format, so there wouldn't be much to convert at all. It can only do useful conversations of presentations to PDF, IMO.

What is it you are attempting to do?

lublak commented 2 years ago

@regebro

I try to export presentation as full web pages. (And that automatically in the background.)

It is possible to convert a presentation to a complete html page via the interface. So LibreOffice already has the possibility. Only how it looks like via the command line I do not know.

Export: grafik as html grafik Standard-HTML grafik grafik

regebro commented 2 years ago

Yeah, that includes defining export formats, etc, which I don't know how to do. It's possible it can be done if we implement support for filter flags, but I'm not sure even then.

If you can figure out how to do it with LibreOffice from the command line, I can look at implementing support for that.

felixble commented 2 years ago

Hi @regebro,

first of all thanks a lot for your great work with this package!

We are having the same issue when using unoconvert to convert odt to html.

Executing unoconvert --convert-to html test.odt test.html causes the following error:

INFO:unoserver:Starting unoconverter.
INFO:unoserver:Opening //b577c1152d56441fa928bd54d914ee07.odt
Traceback (most recent call last):
  File "/usr/local/bin/unoconvert", line 10, in <module>
    sys.exit(main())
  File "/usr/local/lib/python3.7/dist-packages/unoserver/converter.py", line 248, in main
    inpath=args.infile, outpath=args.outfile, convert_to=args.convert_to
  File "/usr/local/lib/python3.7/dist-packages/unoserver/converter.py", line 186, in convert
    f"Could not find an export filter from {import_type} to {export_type}"
RuntimeError: Could not find an export filter from com.sun.star.text.TextDocument to graphic_HTML

We can do it with LibreOffice from the command line with the following command: soffice --headless --convert-to html test.odt.

This produces the following output on the cli: convert /data/test.odt -> /data/test.html using filter : HTML (StarWriter).

It looks like there is an issue when figuring out the correct filter in https://github.com/unoconv/unoserver/blob/3e30d67387ebfa0041ec9e29a67b52ae0cd49d35/src/unoserver/converter.py#L87

EDIT: Setting the variable filtername hardcoded to "HTML (StarWriter)" with the following line in https://github.com/unoconv/unoserver/blob/3e30d67387ebfa0041ec9e29a67b52ae0cd49d35/src/unoserver/converter.py#L183 works in our tests: filtername = "HTML (StarWriter)"

Could you please have a look at this? Can this filter be added?

Thanks!

asmundstavdahl commented 1 year ago

Can be solved with #59