highmed / highmed-dsf

HiGHmed Data Sharing Framework funded by the German Federal Ministry of Education and Research (BMBF, grant ids: 01ZZ1802E and 01ZZ1802A)
Apache License 2.0
32 stars 20 forks source link

Cannot get StructureDefinitions as HTML if the current ping process is deployed #363

Closed schwzr closed 1 year ago

schwzr commented 2 years ago

In DSF Version 0.7.0 you receive a HTTP 500 status code when you try to get the StructureDefinitions (e.g. https://DSF-FHIR-FQDN/fhir/StructureDefinition?_count=200).

This does NOT affect requests where the Content-Types application/fhir+json or application/fhir+xml are accepted.

When you remove the ping process in your BPE installation the StructureDefinitions will be returned.

According to the logs this seems to be a regex issue here:

fhir-app-1    | Caused by: java.lang.IllegalArgumentException: Illegal group reference
fhir-app-1    |     at java.util.regex.Matcher.appendExpandedReplacement(Unknown Source) ~[?:?]
fhir-app-1    |     at java.util.regex.Matcher.appendReplacement(Unknown Source) ~[?:?]
fhir-app-1    |     at java.util.regex.Matcher.replaceAll(Unknown Source) ~[?:?]
fhir-app-1    |     at org.highmed.dsf.fhir.adapter.HtmlFhirAdapter.simplifyXml(HtmlFhirAdapter.java:284) ~[dsf-fhir-rest-adapter-0.7.0.jar:0.7.0]
fhir-app-1    |     at org.highmed.dsf.fhir.adapter.HtmlFhirAdapter.writeXml(HtmlFhirAdapter.java:257) ~[dsf-fhir-rest-adapter-0.7.0.jar:0.7.0]
fhir-app-1    |     at org.highmed.dsf.fhir.adapter.HtmlFhirAdapter.writeTo(HtmlFhirAdapter.java:188) ~[dsf-fhir-rest-adapter-0.7.0.jar:0.7.0]
fhir-app-1    |     at org.highmed.dsf.fhir.adapter.HtmlFhirAdapter.writeTo(HtmlFhirAdapter.java:33) ~[dsf-fhir-rest-adapter-0.7.0.jar:0.7.0]

I traced it down to the following string here, but at the first glance I can't see any obvious reason why it fails:

<expression value="matches('^P(?:([0-9]+)Y)?(?:([0-9]+)M)?(?:([0-9]+)D)?(T(?:([0-9]+)H)?(?:([0-9]+)M)?(?:([0-9]+)(?:[.,]([0-9]{0,9}))?S)?)?$')"></expression>

You can find a minimal example to reproduce the issue here:

import java.io.IOException;
import java.util.regex.Matcher;
import java.util.regex.Pattern;

public class Main {
    private static final Pattern CLOSABLE_XML_TAGS = Pattern
            .compile("(m?)[\\t ]*<[a-zA-Z0-9]+( value=\".*\"){0,1}(></[a-zA-Z0-9]+>)");
    public static void main(String[] args) throws IOException {
        String s = "<expression value=\"matches('^P(?:([0-9]+)Y)?(?:([0-9]+)M)?(?:([0-9]+)D)?(T(?:([0-9]+)H)?(?:([0-9]+)M)?(?:([0-9]+)(?:[.,]([0-9]{0,9}))?S)?)?$')\"></expression>\n";
        Matcher matcher = CLOSABLE_XML_TAGS.matcher(s);
        String s2 = matcher.replaceAll(r -> {
            System.out.println(r.group());
            System.out.println(r.group(3));
            return r.group().replace(r.group(3), "/>");
        });
    }
}
hhund commented 2 years ago

The problem seems to be a result of the way the replaceAll method works internally with a replacer-function. It seems that the input String for the matcher may not contain unescaped $ characters.

Adding a s = s.replace("$", "\\$"); to the minimal example above fixes the java.lang.IllegalArgumentException: Illegal group reference and gives the expected result:

<expression value="matches('^P(?:([0-9]+)Y)?(?:([0-9]+)M)?(?:([0-9]+)D)?(T(?:([0-9]+)H)?(?:([0-9]+)M)?(?:([0-9]+)(?:[.,]([0-9]{0,9}))?S)?)?$')"/>
import java.io.IOException;
import java.util.regex.Matcher;
import java.util.regex.Pattern;

public class Main {
    private static final Pattern CLOSABLE_XML_TAGS = Pattern
            .compile("(m?)[\\t ]*<[a-zA-Z0-9]+( value=\".*\"){0,1}(></[a-zA-Z0-9]+>)");
    public static void main(String[] args) throws IOException {
        String s = "<expression value=\"matches('^P(?:([0-9]+)Y)?(?:([0-9]+)M)?(?:([0-9]+)D)?(T(?:([0-9]+)H)?(?:([0-9]+)M)?(?:([0-9]+)(?:[.,]([0-9]{0,9}))?S)?)?$')\"></expression>\n";
        s = s.replace("$", "\\$"); //<- fix for Illegal group reference
        Matcher matcher = CLOSABLE_XML_TAGS.matcher(s);
        String s2 = matcher.replaceAll(r -> {
            System.out.println(r.group());
            System.out.println(r.group(3));
            return r.group().replace(r.group(3), "/>");
        });
        System.out.println(s2);
    }
}
hhund commented 2 years ago

Alternatively we could replace the regex-code with a javax.xml.transform.Transformer:

import java.io.StringReader;
import java.io.StringWriter;

import javax.xml.transform.OutputKeys;
import javax.xml.transform.Source;
import javax.xml.transform.Transformer;
import javax.xml.transform.TransformerFactory;
import javax.xml.transform.stream.StreamResult;
import javax.xml.transform.stream.StreamSource;

public class XmlTransformerTest
{
    public static void main(String[] args) throws Exception
    {
        String s = "<expression value=\"matches('^P(?:([0-9]+)Y)?(?:([0-9]+)M)?(?:([0-9]+)D)?(T(?:([0-9]+)H)?(?:([0-9]+)M)?(?:([0-9]+)(?:[.,]([0-9]{0,9}))?S)?)?$')\"></expression>";

        Transformer transformer = TransformerFactory.newInstance().newTransformer();
        transformer.setOutputProperty(OutputKeys.METHOD, "xml");
        transformer.setOutputProperty(OutputKeys.OMIT_XML_DECLARATION, "yes");
        transformer.setOutputProperty(OutputKeys.INDENT, "yes");
        transformer.setOutputProperty("{http://xml.apache.org/xalan}indent-amount", "3");
        StringWriter writer = new StringWriter();
        transformer.transform(new StreamSource(new StringReader(s)), new StreamResult(writer));
        String s2 = writer.toString();
        System.out.println(s2);
    }
}

Caution: The Transformer class is not thread-save.

hhund commented 2 years ago

There is another related issue, with the HTML output generated for OrganizationAffiliation pages:

<participatingOrganization>
    <reference value="Organization/34575bc7-28b4-4028-bf7a-75bb5777851a">
    <type value="Organization"/>
</participatingOrganization>

The reference tag is missing a /. Thanks to G. Fette for reporting the issue.

schwzr commented 2 years ago

There is another related issue regarding the HTML output (e.g. in a FHIR bundle):

<entry>
      ...
      <request>
         <method value="PUT"/>
         <url value="OrganizationAffiliation?primary-organization:identifier=http://highmed.org/sid/organization-identifier|netzwerk-universitaetsmedizin.de&participating-organization:identifier=http://highmed.org/sid/organization-identifier|example.org"/>
      </request>
   </entry>

In the url value the "&" is not url escaped. Below is the correct xml representation (Accept: application/fhir+xml):

<url value="OrganizationAffiliation?primary-organization:identifier=http://highmed.org/sid/organization-identifier|netzwerk-universitaetsmedizin.de&amp;participating-organization:identifier=http://highmed.org/sid/organization-identifier|example.org"></url>