hapifhir / hapi-fhir

🔥 HAPI FHIR - Java API for HL7 FHIR Clients and Servers
http://hapifhir.io
Apache License 2.0
2.04k stars 1.33k forks source link

Illegal attribute name in the XHTML ('target' on 'a') #1397

Closed seanmcilvenna closed 3 years ago

seanmcilvenna commented 5 years ago

NOTE: Before filing a ticket, please see the following URL: https://github.com/jamesagnew/hapi-fhir/wiki/Getting-Help

Describe the bug When using the $validate operation on a ValueSet resource that has Resource.text.div with an a tag and a target attribute is returning the following error:

Illegal attribute name in the XHTML ('target' on 'a')

To Reproduce Create a ValueSet in HAPI that has ValueSet.text.div with the following content:

<div><a href="test.com" target="_new">some link</a></div>

Then execute POST /ValueSet/XXX/$validate

Expected behavior It should not return an error for the <a> element's target attribute.

markiantorno commented 3 years ago

Alright, so I created a quick test locally for this on my machine, and I can confirm that this is not an issue with HAPI, this is an issue with the core standard validation tools. (https://github.com/hapifhir/org.hl7.fhir.core)

I used an example ValueSet resource with a tag added:

{
  "resourceType": "ValueSet",
  "id": "yesnodontknow",
  "text": {
    "status": "generated",
    "div": "<div xmlns=\"http://www.w3.org/1999/xhtml\">\n <div><a href=\"www.google.ca\" target=\"_new\">test link to google</a></div>\n     <h3>Value Set Contents</h3>\n      <p>This value set contains 3 concepts</p>\n      <table class=\"codes\">\n        <tr>\n          <td style=\"white-space:nowrap\">\n            <b>Code</b>\n          </td>\n          <td>\n            <b>System</b>\n          </td>\n          <td>\n            <b>Display</b>\n          </td>\n          <td>\n            <b>Definition</b>\n          </td>\n        </tr>\n        <tr>\n          <td style=\"white-space:nowrap\">\n            <a name=\"http---terminology.hl7.org-CodeSystem-v2-0136-Y\"> </a>\n            <a href=\"v2/0136/index.html#v2-0136-Y\">Y</a>\n          </td>\n          <td>http://terminology.hl7.org/CodeSystem/v2-0136</td>\n          <td>Yes</td>\n          <td/>\n        </tr>\n        <tr>\n          <td style=\"white-space:nowrap\">\n            <a name=\"http---terminology.hl7.org-CodeSystem-v2-0136-N\"> </a>\n            <a href=\"v2/0136/index.html#v2-0136-N\">N</a>\n          </td>\n          <td>http://terminology.hl7.org/CodeSystem/v2-0136</td>\n          <td>No</td>\n          <td/>\n        </tr>\n        <tr>\n          <td style=\"white-space:nowrap\">\n            <a name=\"http---terminology.hl7.org-CodeSystem-data-absent-reason-asked-unknown\"> </a>\n            <a href=\"codesystem-data-absent-reason.html#data-absent-reason-asked-unknown\">asked-unknown</a>\n          </td>\n          <td>http://terminology.hl7.org/CodeSystem/data-absent-reason</td>\n          <td>Don't know</td>\n          <td>The source was asked but does not know the value.</td>\n        </tr>\n      </table>\n    </div>"
  },
  "url": "http://hl7.org/fhir/ValueSet/yesnodontknow",
  "version": "4.0.1",
  "name": "Yes/No/Don't Know",
  "status": "draft",
  "description": "For Capturing simple yes-no-don't know answers",
  "compose": {
    "include": [
      {
        "valueSet": [
          "http://terminology.hl7.org/ValueSet/v2-0136"
        ]
      },
      {
        "system": "http://terminology.hl7.org/CodeSystem/data-absent-reason",
        "concept": [
          {
            "code": "asked-unknown",
            "display": "Don't know"
          }
        ]
      }
    ]
  },
  "expansion": {
    "identifier": "urn:uuid:bf99fe50-2c2b-41ad-bd63-bee6919810b4",
    "timestamp": "2015-07-14T10:00:00Z",
    "contains": [
      {
        "system": "http://terminology.hl7.org/CodeSystem/v2-0136",
        "code": "Y",
        "display": "Yes"
      },
      {
        "system": "http://terminology.hl7.org/CodeSystem/v2-0136",
        "code": "N",
        "display": "No"
      },
      {
        "system": "http://terminology.hl7.org/CodeSystem/data-absent-reason",
        "code": "asked-unknown",
        "display": "Don't know"
      }
    ]
  }
}

and when I ran it through the core validator, I got the following output:

---- ahref-illegal-name ----------------------------------------------------------------
** Core: 
SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in [jar:file:/home/mark/.m2/repository/org/slf4j/slf4j-simple/1.7.28/slf4j-simple-1.7.28.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/home/mark/.m2/repository/ch/qos/logback/logback-classic/1.2.3/logback-classic-1.2.3.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
SLF4J: Actual binding is of type [org.slf4j.impl.SimpleLoggerFactory]
Start Validating (8796 to set up)
Times (ms): overall = 86, tx = 71, sd = 0, load = 9, fpe = 3
WARNING: ValueSet: vsd-0: 'Name should be usable as an identifier for the module by machine processing applications such as code generation' Rule 'Name should be usable as an identifier for the module by machine processing applications such as code generation' Failed
ERROR: ValueSet.text.div: txt-1: 'The narrative SHALL contain only the basic html formatting elements and attributes described in chapters 7-11 (except section 4 of chapter 9) and 15 of the HTML 4.0 standard, <a> elements (either name or href), images and internally contained style attributes' Rule 'The narrative SHALL contain only the basic html formatting elements and attributes described in chapters 7-11 (except section 4 of chapter 9) and 15 of the HTML 4.0 standard, <a> elements (either name or href), images and internally contained style attributes' Failed
ERROR: ValueSet.text.div: Illegal attribute name in the XHTML ('target' on 'a')

java.lang.AssertionError: Test ahref-illegal-name: Expected 0 errors, but found 2. 
Expected :0
Actual   :2
<Click to see difference>
    at org.junit.Assert.fail(Assert.java:89)
    at org.junit.Assert.failNotEquals(Assert.java:835)
    at org.junit.Assert.assertEquals(Assert.java:647)
    at org.hl7.fhir.validation.tests.ValidationTests.checkOutcomes(ValidationTests.java:398)
    at org.hl7.fhir.validation.tests.ValidationTests.test(ValidationTests.java:245)
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
    at java.lang.reflect.Method.invoke(Method.java:498)
    at org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:59)
    at org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
    at org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:56)
    at org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
    at org.junit.runners.ParentRunner$3.evaluate(ParentRunner.java:306)
    at org.junit.runners.BlockJUnit4ClassRunner$1.evaluate(BlockJUnit4ClassRunner.java:100)
    at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:366)
    at org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:103)
    at org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:63)
    at org.junit.runners.ParentRunner$4.run(ParentRunner.java:331)
    at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:79)
    at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:329)
    at org.junit.runners.ParentRunner.access$100(ParentRunner.java:66)
    at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:293)
    at org.junit.runners.ParentRunner.run(ParentRunner.java:413)
    at org.junit.runners.Suite.runChild(Suite.java:128)
    at org.junit.runners.Suite.runChild(Suite.java:27)
    at org.junit.runners.ParentRunner$4.run(ParentRunner.java:331)
    at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:79)
    at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:329)
    at org.junit.runners.ParentRunner.access$100(ParentRunner.java:66)
    at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:293)
    at org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:27)
    at org.junit.runners.ParentRunner$3.evaluate(ParentRunner.java:306)
    at org.junit.runners.ParentRunner.run(ParentRunner.java:413)
    at org.junit.runner.JUnitCore.run(JUnitCore.java:137)
    at com.intellij.junit4.JUnit4IdeaTestRunner.startRunnerWithArgs(JUnit4IdeaTestRunner.java:69)
    at com.intellij.rt.junit.IdeaTestRunner$Repeater.startRunnerWithArgs(IdeaTestRunner.java:33)
    at com.intellij.rt.junit.JUnitStarter.prepareStreamsAndStart(JUnitStarter.java:220)
    at com.intellij.rt.junit.JUnitStarter.main(JUnitStarter.java:53)
Process finished with exit code 255

I'm going to investigate the actual cause now. But I just wanted to confirm that I could reproduce your issue, given the information you provided.

markiantorno commented 3 years ago

Alright, so this is actually not a bug, @seanmcilvenna.

If you look up the documentation for narratives (https://www.hl7.org/fhir/narrative.html):

"The contents of the div element are an XHTML fragment that SHALL contain only the basic HTML formatting elements described in chapters 7-11 (except section 4 of chapter 9) and 15 of the HTML 4.0 standard, elements (either name or href), images and internally contained style attributes. The XHTML content SHALL NOT contain a head, a body element, external stylesheet references, deprecated elements, scripts, forms, base/link/xlink, frames, iframes, objects or event related attributes (e.g. onClick). This is to ensure that the content of the narrative is contained within the resource and that there is no active content. Such content would introduce security issues and potentially safety issues with regard to extracting text from the XHTML. Note that even with these restrictions, there are still several important security risks associated with displaying the narrative."

The gist of it is that the target tag you're using in your href is not considered a basic HTML formatting element (it's in chapter 16 of the HTML 4.0 standard, https://www.w3.org/TR/1998/REC-html40-19980424/html40.pdf), and thus, when the validator parses the tag, it rejects it.

If you want this to pass validation, remove the target part.

So, instead of:

<div><a href="test.com" target="_new">some link</a></div>

put this instead:

<div><a href="test.com">some link</a></div>

That should pass. At least it did when I tested it.

Let me know if you have any questions on this. For now, I'm closing this and marking it as not a bug.