unoconv / unoserver

MIT License
495 stars 69 forks source link

Unable to set custom filter `EmbedImages` #75

Closed Acconut closed 10 months ago

Acconut commented 12 months ago

First of all, thank you for unoserver! It is a very helpful tool.

I am trying to convert a .doc file (available at https://file-examples.com/wp-content/storage/2017/02/file-sample_100kB.doc) to a HTML file. The document contains images and I would like to embed them into the HTML.

I am able to achieve this by setting the EmbedImages filter option when using LibreOffice 7.4.7.2 directly:

soffice.bin --convert-to html:HTML:EmbedImages=true --outdir ./soffice/ ./file-sample_100kB.doc

When inspecting the output HTML, we can see that the images are embedded as Base64 blobs:

$ cat ./soffice/file-sample_100kB.html | grep img
<img src="data:image/png;base64,iVBORw0KGgoAAAANSUhEUgAA[...]

However, I am not able to achieve the same using the new --filter-options parameter from @mitya57's #73:

unoconvert --port 3030 --convert-to html --filter HTML --filter-options EmbedImages=true ./file-sample_100kB.doc ./unoserver/out.html

The output HTML contains references to the image files instead of being embedded:

$ cat ./unoserver/out.html | grep img
<img src="out_html_cfebe7d79adb9f29.gif" name="Object1" align="left" width="430" height="216"/>
<img src="out_html_88d1d80d8cf64c87.jpg" name="Image1" align="left" width="642" height="429" border="0"/>

I also tried using --filter "HTML (StarWriter)" or leaving out the--convert-toand/or--filterflags entirely, but without success. Since it is working when using LibreOffice directly, I suspect that somehowEmbedImagesis not forwarded to LibreOffice properly. The same happens when using theSkipImages` flag, which is not applied when using unoconvert.

I would appreciate any help here.

regebro commented 12 months ago

In the version of LibreOffice I have installed there are no flags for the HTML filters, and I can not make it embed with the --convert-to html:HTML:EmbedImages=true. However, if I use --convert-to xhtml:HTML it will embed the images.

For unoserver, the filtername is "XHTML Writer File". So unoconvert image.doc image.html --filter "XHTML Writer File" works for me.

Acconut commented 8 months ago

Thank you for the quick response, I will try it out :)

m-bagheri commented 6 months ago

Hi @regebro

I followed what you suggested but I still have the same issue and images are not embedded.

I have tried all the followings: unoconvert image.doc image.html --filter "XHTML Writer File" unoconvert image.doc image.html --filter "XHTML Writer File" --filter-options EmbedImages=true unoconvert image.doc image.html --convert-to xhtml --filter "XHTML Writer File" --filter-options EmbedImages=true unoconvert image.doc image.html --convert-to xhtml unoconvert image.doc image.html --convert-to xhtml --filter "XHTML Writer File" unoconvert image.doc image.html --convert-to html --filter "XHTML Writer File" --filter-options EmbedImages=true unoconvert image.doc image.html --convert-to html --filter "XHTML Writer File" unoconvert image.doc image.html --convert-to html --filter "HTML" --filter-options EmbedImages=true unoconvert image.doc image.html --convert-to html --filter "HTML (StarWriter)" --filter-options EmbedImages=true

None of these have worked for me and still the images is appearing unembedded Some of them fail straight away too and don't work at all

These are my environment details:

Any thought?

regebro commented 6 months ago

Can you attach an example document that doesn't work for you?

m-bagheri commented 6 months ago

Here it is @regebro

Thanks.

test.docx

m-bagheri commented 6 months ago

I tried with the following version too and same result

m-bagheri commented 6 months ago

@regebro I found that if I move the image from footer to body of the document your solution works However using that filter completely ditches the footer and header of the document while HTML (StarWriter) filter keeps the header and footer with no image embedded

Maybe that's a better issue explanation and I just realised it

regebro commented 6 months ago

With LibreOffice 7.6.3.2 60(Build:2) this works for me:

unoconvert image.doc image.html --filter "HTML"

Also unoconvert image.doc image.html --filter "HTML (StarWriter)" works, I think it's the same.

I'm not sure any of the HTML filters has any usable options in these versions, the XHTML will ignore the footer and embed images, and the HTML will include the footer but not include the images. LibreOffice doesn't show any filter options for these things, so I don't think there are any.