jodconverter / jodconverter

JODConverter automates document conversions using LibreOffice or Apache OpenOffice.
https://github.com/jodconverter/jodconverter
Other
1.41k stars 286 forks source link

from html to ODT with embedded pictures #110

Closed surli closed 5 years ago

surli commented 6 years ago

Hi,

I'm trying to create an ODT document from an HTML containing pictures. The link of the pictures are local, so I'd like to embed the pictures directly in ODT. We were before using the ImageEmbeddedFilter provided by XWiki fork, so I wonder if the same kind of feature exists in your own fork?

I noticed there is a GraphicInserterFilter but from what I understood it needs to specify explicitely the path of the images to embed. Here I want to process all images of the document. Any hint?

sbraconnier commented 6 years ago

Hi there,

I have been inspired by the XWiki fork for the filters feature of jodconverter, but I didn't want to apply them by default. So you could just keep your own ImageEmbedderFilter and add it to the filter chain when creating your converter.

I took a look at the commits you have done so far in the XWIKI project so here is what I would do:

Put the following class somewhere in your project. It is a copy of the XWIKI ImageEmbedderFilter modified so it can be used with my fork of jodconverter:

import org.jodconverter.filter.Filter;
import org.jodconverter.filter.FilterChain;
import org.jodconverter.office.LocalOfficeContext;
import org.jodconverter.office.OfficeContext;

import com.sun.star.awt.Size;
import com.sun.star.beans.PropertyValue;
import com.sun.star.beans.XPropertySet;
import com.sun.star.container.XIndexAccess;
import com.sun.star.graphic.XGraphicProvider;
import com.sun.star.lang.XComponent;
import com.sun.star.lang.XServiceInfo;
import com.sun.star.text.XTextGraphicObjectsSupplier;
import com.sun.star.uno.UnoRuntime;

/** This filter is used to embed external images into a document. */
public class ImageEmbedderFilter implements Filter {

  @Override
  public void doFilter(
      final OfficeContext context, final XComponent document, final FilterChain chain)
      throws Exception {

    if (UnoRuntime.queryInterface(XServiceInfo.class, document)
        .supportsService("com.sun.star.text.GenericTextDocument")) {
      embedWriterImages(document, context);
    }

    // Invoke the next filter in the chain
    chain.doFilter(context, document);
  }

  private void embedWriterImages(final XComponent document, final OfficeContext context)
      throws Exception {

    final LocalOfficeContext localContext = (LocalOfficeContext) context;

    final XIndexAccess indexAccess =
        UnoRuntime.queryInterface(
            XIndexAccess.class,
            UnoRuntime.queryInterface(XTextGraphicObjectsSupplier.class, document)
                .getGraphicObjects());
    final XGraphicProvider graphicProvider =
        UnoRuntime.queryInterface(
            XGraphicProvider.class,
            localContext
                .getComponentContext()
                .getServiceManager()
                .createInstanceWithContext(
                    "com.sun.star.graphic.GraphicProvider", localContext.getComponentContext()));
    final PropertyValue[] queryProperties = new PropertyValue[] {new PropertyValue()};
    queryProperties[0].Name = "URL";
    for (int i = 0; i < indexAccess.getCount(); i++) {
      try {
        final XPropertySet graphicProperties =
            UnoRuntime.queryInterface(XPropertySet.class, indexAccess.getByIndex(i));
        final String graphicURL = (String) graphicProperties.getPropertyValue("GraphicURL");
        if (!graphicURL.contains("vnd.sun.star.GraphicObject")) {
          queryProperties[0].Value = graphicURL;
          // Before embedding the image, the "ActualSize" property holds the image
          // size specified in the document content. If the width or height are not
          // specified then their actual values will be 0.
          final Size specifiedSize =
              UnoRuntime.queryInterface(
                  Size.class, graphicProperties.getPropertyValue("ActualSize"));
          graphicProperties.setPropertyValue(
              "Graphic", graphicProvider.queryGraphic(queryProperties));
          // Images are embedded as characters (see TextContentAnchorType.AS_CHARACTER)
          // and their size is messed up if it's not explicitly specified (e.g. if the
          // image height is not specified then it takes the line height).
          adjustImageSize(graphicProperties, specifiedSize);
        }
      } catch (Exception e) {
        // Skip this graphic.
      }
    }
  }

  private void adjustImageSize(final XPropertySet graphicProperties, final Size specifiedSize) {

    try {
      // After embedding the image, the "ActualSize" property holds the actual image size.
      final Size size =
          UnoRuntime.queryInterface(Size.class, graphicProperties.getPropertyValue("ActualSize"));
      // Compute the width and height if not specified, preserving aspect ratio.
      if (specifiedSize.Width == 0 && specifiedSize.Height == 0) {
        specifiedSize.Width = size.Width;
        specifiedSize.Height = size.Height;
      } else if (specifiedSize.Width == 0) {
        specifiedSize.Width = specifiedSize.Height * size.Width / size.Height;
      } else if (specifiedSize.Height == 0) {
        specifiedSize.Height = specifiedSize.Width * size.Height / size.Width;
      }
      graphicProperties.setPropertyValue("Size", specifiedSize);
    } catch (Exception e) {
      // Ignore this image.
    }
  }
}

then in your DefaultOfficeServer class, create the converter this way:

...
// Try to use the JSON document format registry to configure the office document conversion.
InputStream input = getClass().getResourceAsStream(DOCUMENT_FORMATS_PATH);
if (input != null) {
  try {
    this.jodConverter =
        LocalConverter.builder()
            .officeManager(this.jodManager)
            .formatRegistry(JsonDocumentFormatRegistry.create(input))
            .filterChain(new ImageEmbedderFilter())
            .build();
  } catch (Exception e) {
    this.logger.warn(
        "Failed to parse {} . The default document format registry will be used instead.",
        DOCUMENT_FORMATS_PATH,
        e);
  }
} else {
  this.logger.debug(
      "{} is missing. The default document format registry will be used instead.",
      DOCUMENT_FORMATS_PATH);
}
if (this.jodConverter == null) {
  // Use the default document format registry.
  this.jodConverter =
      LocalConverter.builder()
          .officeManager(this.jodManager)
          .filterChain(new ImageEmbedderFilter())
          .build();
}
...

It should then work just fine.

I'll let you know if I find another way to do this.

surli commented 6 years ago

So apparently it's not working on my test, I get an exception when getting the GraphicURL value:

com.sun.star.uno.RuntimeException: Getting from this property is not supported

That's not related to your code then, but in case you've got an idea... :)

Now I got one small issue with your code: we currently reuse all times the same converter instance in our code, and from what I understand in your fork it has not been designed with this purpose. Still, is there a way to reset the filter chain from a converter?

Right now if I call the converter twice, the chain has been consumed the first time and my filter is not called the second time.

sbraconnier commented 6 years ago

Could you please attach a small zip here with a sample document I could work with to reproduce the RuntimeException ? I'm not able to reproduce it!

As for your problem with the filter chain, it looks like an issue to me. The filter chain should be copied for each conversion, but instead the same chain is used for each conversion. I'll fix this and release a new version this week. But here is how you could reset the filter chain yourself if you want to test your work before the next release.

DefaultFilterChain chain =new DefaultFilterChain(new ImageEmbedderFilter());
LocalConverter converter = LocalConverter.builder().filterChain(chain).build();
converter.convert(...); // First conversion
chain.reset(); // Reset the chain
converter.convert(...); // Another conversion

So you can keep a reference to the created filter chain and reset it before each conversion. Not very pleasant but it would work!

surli commented 6 years ago

Could you please attach a small zip here with a sample document I could work with to reproduce the RuntimeException ? I'm not able to reproduce it!

I committed my code here: https://github.com/surli/testjodconverter That's normal you did not get the exception from the previous code: we were skipping the exception in ImageEmbedderFilter. I just stop skipping them to check what really happens.

Thanks for the hint about the filter chain, I'll check that.

sbraconnier commented 6 years ago

Hmmm I'm not getting any exception and the document is converted with the embedded image successfully. Can you please add more information. OS and version? Libre or Open Office and version, etc?. Thanks.

surli commented 6 years ago

I'm on ArchLinux. Here's my version of LibreOffice:

Version: 6.1.2.1
Build ID: 6.1.2-4

Here's the version of the JDK I'm using:

java version "1.8.0_192"
Java(TM) SE Runtime Environment (build 1.8.0_192-b12)
Java HotSpot(TM) 64-Bit Server VM (build 25.192-b12, mixed mode)
sbraconnier commented 6 years ago

I was using the LibreOffice 6.0.x branch. I can confirm that I got the RuntimeException using the 6.1.2 and 6.1.3 branches... Something has changed I suppose! I'll investigate and report back here what I found.

sbraconnier commented 6 years ago

Well. I may be wrong, but it seems that using LibreOffice 6.2.1+, you don't need the ImageEmbedderFilter at all!! Just comment the following line (44):

.filterChain(new ImageEmbedderFilter())

in your sample project and the output file will have the image embedded.

surli commented 6 years ago

in your sample project and the output file will have the image embedded.

I just tested it back and it's not embedded: it makes a reference to the file on the system. If I delete the picture locally, it's not displayed anymore in the .odt.

sbraconnier commented 6 years ago

Hmm ok then! I'll continue to investigate. Anyway we must find a solution that is supported for both 6.0.x and 6.2.1+ branches.

surli commented 6 years ago

I just got that, regarding the support of images in 6.1.x branches: https://wiki.documentfoundation.org/ReleaseNotes/6.1#Image_handling_rework

Edit: I might have a solution explained from https://tomazvajngerl.blogspot.com/2018/03/improving-image-handling-in-libreoffice.html

All these properties are now deprecated and removed and an alternative was added (where needed) that uses the XGraphic or XBitmap types (they use the same implementation so either can be used). This was done as following:

GraphicURL -> Graphic (type XGraphic) and GraphicBitmap (type XBitmap) for bullets

sbraconnier commented 6 years ago

Thanks for that! It will help! According to this, the GraphicURL property has been removed. I'll try to find another way to embed images.

sbraconnier commented 6 years ago

Well! It's not an easy one!! Stay tuned, I'll find a way!

sbraconnier commented 6 years ago

I must surrender... :(

I've filled a bug to LibreOffice about this.

surli commented 6 years ago

Thanks for the try anyway! You know the API better, so I'll wait for an answer on the bug you open. Is it ok to leave this ticket open until we get a solution?

sbraconnier commented 6 years ago

Of course!

sbraconnier commented 5 years ago

Could you please test the 4.2.2-SNAPSHOT version ? It is available in the oss snapshot maven repository

This should fix the filter chain problem (#112)

It should also support dealing with graphics for new and older version of LibreOffice. You can check how I did it in the LinkedImagesEmbedderFilter. We have to check the Office product (LO or AOO) and version.

You can use this filter in your code to embed linked images. Now your code should look like this, without any custom filter on your side:

...
// Try to use the JSON document format registry to configure the office document conversion.
InputStream input = getClass().getResourceAsStream(DOCUMENT_FORMATS_PATH);
if (input != null) {
  try {
    this.jodConverter =
        LocalConverter.builder()
            .officeManager(this.jodManager)
            .formatRegistry(JsonDocumentFormatRegistry.create(input))
            .filterChain(new LinkedImagesEmbedderFilter())
            .build();
  } catch (Exception e) {
    this.logger.warn(
        "Failed to parse {} . The default document format registry will be used instead.",
        DOCUMENT_FORMATS_PATH,
        e);
  }
} else {
  this.logger.debug(
      "{} is missing. The default document format registry will be used instead.",
      DOCUMENT_FORMATS_PATH);
}
if (this.jodConverter == null) {
  // Use the default document format registry.
  this.jodConverter =
      LocalConverter.builder()
          .officeManager(this.jodManager)
          .filterChain(new LinkedImagesEmbedderFilter())
          .build();
}
...

Note that the filter I made doesn't "fix" the image size like the XWiki filter did. I didn't need to fix anything, the image was just fine as is. If you can reproduce a problem with the image size, please let me know.

surli commented 5 years ago

Nice! I just tested locally on my small example and it works. I'll try ASAP on XWiki to check everything's alright. When do you expect to do the new release?

sbraconnier commented 5 years ago

I'll have to add some unit tests but it will be somewhere in time next week :)

surli commented 5 years ago

Awesome! I keep you in touch for the test of the last snapshot on XWiki.

surli commented 5 years ago

BTW do you only support latest version of LO and AOO or do you maintain the converter for a set of version? Is it documented somewhere?

surli commented 5 years ago

so FTR, I just tried the last snapshot on XWiki and it's all green for us.

sbraconnier commented 5 years ago

The 4.2.2 version has been released!

sbraconnier commented 5 years ago

@surli

BTW do you only support latest version of LO and AOO or do you maintain the converter for a set of version? Is it documented somewhere?

I don't have a set of supported versions. But the libreoffice installed on the travis CI server is 4.2.x. So when I push my changes to GitHub, it is tested on LO 4.2.x. This is why the release is done only today. I had a broken build for 2 days regarding the latest changes (LinkedImagesEmbedderFilter).

surli commented 5 years ago

OK thanks for the info and for taking time on this issue :) FTR and if you're interested, we will setup on XWiki some tests to check that our code is compliant with various version of LO by using testcontainer and different docker containers with LO.

sbraconnier commented 5 years ago

Wow thanks for the info. I'll take a look for sure!