opensagres / xdocreport

XDocReport means XML Document reporting. It's Java API to merge XML document created with MS Office (docx) or OpenOffice (odt), LibreOffice (odt) with a Java model to generate report and convert it if you need to another format (PDF, XHTML...).
https://github.com/opensagres/xdocreport
1.22k stars 372 forks source link

HTML Text Styling is not retaining source font size and formatting #245

Open Tatskaari opened 7 years ago

Tatskaari commented 7 years ago

Hi,

We're trying to merge in some HTML fields into an ODT document. We're expecting the font face and text size to be determined by the style of the mail merge field. This is how it works for plain text fields however HTML fields seem to set the style-name to XDocReport_% which inherits from the default document styles. This means that while the mailmerge fields are set to use the Arial style, once mail merged they're Calibri.

There was an issue on the old google code issue tracker that sounds very similar. Did anything happen with this? https://code.google.com/archive/p/xdocreport/issues/363

Is it possible to do what we need or is an enhancement required? Looking at the code, it looks like we could generate styles per text element that inherit from the XDocReport_% but set all the styles explicitly added to the mailmerge field. Would this be the best approach?

Tatskaari commented 7 years ago

Looking at the internals of the library, it seems like quite the endeavor to get the style generator to generate styles that inherit from another style. It's nested behind a number of interfaces that are generic across many output formats. I've instead opted to do some post processing to generate bespoke styles for each XDocReport style. It's not the prettiest of code but it works. Feel free to add it to the ODTReport post processing method if you see fit.

 /**
   * XDoc report will style elements with a set of styles that start with XDocReport_ such as XDocReport_Bold. These
   * styles inherit from the default paragraph styles so will ignore the style of the mail merge field. Another issue
   * was that nested styles would not stack. For example <b>bold <i>italic</i></b> would look <b>bold</b> <i>italic</i>
   * because unlike HTML and CSS, styles aren't automatically inherited.
   *
   * This method loops through and generate new styles for each element with an XDocReport style. This new style will
   * inherit fromt he parent style but copy all the style rules from the XDocReport style.
   *
   * @param pXDocArchive the output archive from processing the mail merge fields. This archive is updated with the changes.
   */
  protected static void applyParentStyleToChildren(XDocArchive pXDocArchive)
  throws ParserConfigurationException, IOException, SAXException  {
    DocumentBuilderFactory lDocumentBuilderFactory = DocumentBuilderFactory.newInstance();
    lDocumentBuilderFactory.setNamespaceAware(true);
    DocumentBuilder lDocumentBuilder = lDocumentBuilderFactory.newDocumentBuilder();

    // This is a document of the base styles for the document
    Document lStyleDoc= lDocumentBuilder
      .parse(pXDocArchive.getEntryInputStream("styles.xml"));

    // This document contains the text content as well as a number of "automatic" styles for each element. We generate
    // a new automatic style for each text node with a XDocReport style.
    Document lContentDoc = lDocumentBuilder
      .parse(pXDocArchive.getEntryInputStream("content.xml"));

    // recursively apply the parent styles to child nodes
    applyParentStyleToChildren(lContentDoc, lStyleDoc);

    // write the changes back into the archive
    ByteArrayOutputStream lContentOutStream = new ByteArrayOutputStream();
    new XMLSerializer(lContentOutStream,  new OutputFormat(lContentDoc)).serialize(lContentDoc);
    XDocArchive.setEntry(pXDocArchive, "content.xml", new ByteArrayInputStream(lContentOutStream.toByteArray()));
  }

  private static void applyParentStyleToChildren(Document pContentDoc, Document pStyleDoc){
    // Get the list of styles under the office:styles element and add them to lStyles
    Element lOfficeStyle = (Element) pStyleDoc.getDocumentElement().getElementsByTagName("office:styles").item(0);
    NodeList lNodeList = lOfficeStyle.getElementsByTagName("style:style");
    Map<String, Element> lStyles = new HashMap<>();

    for (int i = 0; i < lNodeList.getLength(); i++){
      Element lStyleElement = (Element) lNodeList.item(i);
      String lStyleName =  lStyleElement.getAttribute("style:name");
      lStyles.put(lStyleName, lStyleElement);
    }

    // Recurse through all the elements in the document body
    Element lBody = (Element) pContentDoc.getDocumentElement().getElementsByTagName("office:body").item(0);
    Map<String, Element> lNewStyles = new HashMap<>();
    updateChildStyles(lBody, null, lStyles, lNewStyles);

    // Copy the generated styles to the automatic style list in content.xml
    Element lAutomaticStyle = (Element) pContentDoc.getDocumentElement().getElementsByTagName("office:automatic-styles").item(0);
    for (Element lStyle : lNewStyles.values()){
      lAutomaticStyle.appendChild(pContentDoc.importNode(lStyle, true));
    }
  }

  private static void updateChildStyles(Element pElement, String pParentStyleName, Map<String, Element> pStyles, Map<String, Element> pNewStyles){
    String lTextNS = pElement.getOwnerDocument().lookupNamespaceURI("text");

    // Loop through each child node setting the style appropriately
    NodeList lChildren = pElement.getChildNodes();
    for(int i = 0; i < lChildren.getLength(); i++){
      if (lChildren.item(i) instanceof Element){
        Element lChildElement = (Element) lChildren.item(i);
        String lNextParentStyleName = pParentStyleName;
        if (lChildElement.hasAttribute("text:style-name")){
          String lChildStyleName = lChildElement.getAttribute("text:style-name");

          // If this node is a XDocReport style, generate a new style with the XDocReport styling rules that inherits
          // from pParentStyleName
          if (lChildStyleName.startsWith("XDocReport_") && pParentStyleName != null){
            String lNewStyleName = getOrCreateStyle(pParentStyleName, pStyles.get(lChildStyleName), pNewStyles);
            lChildElement.setAttributeNS(lTextNS, "text:style-name", lNewStyleName);

            // Any child should inherit from this new style
            lNextParentStyleName = lNewStyleName;
          } else {
            // Any child should inherit from this elements style
            lNextParentStyleName = lChildStyleName;
          }
        }

        // Finally we should update any children of this element
        updateChildStyles(lChildElement, lNextParentStyleName, pStyles, pNewStyles);
      }
    }
  }

  /**
   *
   * @param pParentStyleName The containing element's style name
   * @param pStyleToMerge The XDocReport style which we're making inherit pParentStyleName
   * @param pNewStyles A map of the styles we've already generated
   * @return the new style name
   */
  private static String getOrCreateStyle(String pParentStyleName, Element pStyleToMerge, Map<String, Element> pNewStyles){
    String lStyleNsUri = pStyleToMerge.getOwnerDocument().lookupNamespaceURI("style");
    String lStyleToMergeName = pStyleToMerge.getAttribute("style:name");
    String lNewStyleName = pParentStyleName + "_" + lStyleToMergeName;

    // if we've not already generated this style then copy the XDocReport style making it inherit from pParentStyleName
    if (!pNewStyles.containsKey(lNewStyleName)){
      Element lNewStyle = (Element) pStyleToMerge.cloneNode(true);
      lNewStyle.setAttributeNS(lStyleNsUri, "style:name", lNewStyleName);
      lNewStyle.setAttributeNS(lStyleNsUri, "style:parent-style-name", pParentStyleName);
      pNewStyles.put(lNewStyleName, lNewStyle);
    }

    return lNewStyleName;
  }