ropensci / EML

Ecological Metadata Language interface for R: synthesis and integration of heterogenous data
https://docs.ropensci.org/EML
Other
97 stars 33 forks source link

`<![CDATA[` not always recognized #342

Open peterdesmet opened 2 years ago

peterdesmet commented 2 years ago

write_eml() will preserve text wrapped in <![CDATA[ and ]]> (expected), but the behaviour is not consistent:

library(EML)

text_1 <- "My text is <a href=\"https://example.com\">html</a>." # Does not work
text_2 <- "<em></em>My text is <a href=\"https://example.com\">html</a>." # Works
text_3 <- "My text is <em>html</em>." # Works

eml <- list(
  dataset = list(
    abstract = list(
      para = list(
        paste0("<![CDATA[", text_1, "]]>"),
        paste0("<![CDATA[", text_2, "]]>"),
        paste0("<![CDATA[", text_3, "]]>")
      )
    )
  )
)

EML::write_eml(eml, "test.xml")

Resulting XML:

<?xml version="1.0" encoding="UTF-8"?>
<eml:eml xmlns:eml="https://eml.ecoinformatics.org/eml-2.2.0" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:stmml="http://www.xml-cml.org/schema/stmml-1.2" packageId="ed7f9836-d2ae-4a04-9473-d377db387c54" system="uuid" xsi:schemaLocation="https://eml.ecoinformatics.org/eml-2.2.0 https://eml.ecoinformatics.org/eml-2.2.0/eml.xsd">
  <dataset>
    <abstract>
      <para>&lt;![CDATA[My text is &lt;a href="https://example.com"&gt;html&lt;/a&gt;.]]&gt;</para>
      <para><![CDATA[<em></em>My text is <a href="https://example.com">html</a>.]]></para>
      <para><![CDATA[My text is <em>html</em>.]]></para>
    </abstract>
  </dataset>
</eml:eml>
mbjones commented 2 years ago

My immediate guess is that this has the same root cause as #315 , and is a side-effect of the approach used to simplify docbook processing. There are some fundamental issues discussed in #315 that would need to be resolved to fix this.