yawlfoundation / yawl

Yet Another Workflow Language
http://www.yawlfoundation.org
GNU Lesser General Public License v3.0
88 stars 35 forks source link

Newline characters in YAWL file decomposition extended attributes lost after save to database #609

Closed mlawry closed 7 years ago

mlawry commented 7 years ago

I have a YAWL specification file (XML) with a decomposition element that looks like this:

<decomposition id="begin" xsi:type="WebServiceGatewayFactsType"
               instructions="Line1&#xA;Line2&#xA;Line3">
  <name>Empty Task</name>
  <externalInteraction>automated</externalInteraction>
</decomposition>

The instructions="..." is an extended attribute containing encoded newline (\n) characters. I found that once the specification XML is uploaded to YAWL (e.g. via resourceService) and saved to database (specifications table) the newlines are lost and get converted to spaces.

I traced the problem to the YAttributeMap class, whose toXML() method is called when saving the specification to database. The rough callstack is:

YAttributeMap.toXML(String key)
YAttributeMap.toXML()
YSpecification.toXML()
YMarshal.marshal(YSpecification specification)
YSpecification.getPersistedXML()
...hibernate code...

The issue seems to be the YAttributeMap.toXML(String key) method, which calls JDOMUtil.encodeEscapes(string s) to encode the attribute value. The JDOMUtil.encodeEscapes(string s) method does not encode newlines, so when the attribute value contains newlines, we end up with a Java string output similar to this:

String xml = "<decomposition instructions=\"Line1\nLine2\nLine3\">Line1\nLine2\nLine3</decomposition>";

Later on in YMarshal class, JDOMUtil.formatXMLStringAsDocument(String s) method is called to format the output from YSpecification.toXML(). This formatting action removes the unencoded newlines within the attribute. You can see this clearly when running the following example code:

import org.yawlfoundation.yawl.util.*;
import org.jdom2.output.*;
public class Test {
    public static void main(String[] args) {
        String xml = "<decomposition instructions=\"Line1\nLine2\nLine3\">Line1\nLine2\nLine3</decomposition>";
        System.out.println(JDOMUtil.formatXMLStringAsDocument(xml));
    }
}

The output from running the above Test is:

<?xml version="1.0" encoding="UTF-8"?>
<decomposition instructions="Line1 Line2 Line3">Line1
Line2
Line3</decomposition>

So as you can see, the instructions attribute no longer has any newline characters.

It seems the "correct" behaviour (e.g. according to https://stackoverflow.com/questions/2004386/how-to-save-newlines-in-xml-attribute) is for newlines in attribute values to be encoded as &#xA;. However, this is not done in JDOMUtil.encodeEscapes(String s) probably because it is only encoding for element text instead of attribute value. So it looks like an appropriate fix would be to have an "encodeAttributeEscapes" method or equivalent, something like the following (using jdom2 to do the heavy lifting):

import org.jdom2.output.EscapeStrategy;
import org.jdom2.output.Format;

public String encodeAttributeEscapes(String value) {
    EscapeStrategy strategy = Format.getRawFormat().getEscapeStrategy();
    String escValue = Format.escapeAttribute(strategy, value);
    return escValue;
}

Then instead of calling JDOMUtil.encodeEscapes(String s) in YAttributeMap class, we have to call this new method instead.

yawlfoundation commented 7 years ago

fixed in latest commit