Closed amoeba closed 5 years ago
Thanks for the note.
The short answer is that the content of textType
nodes like section
or para
are treated as literal, so you would have to write literal XML:
writeLines(
as.character(
emld::as_xml(
list(additionalInfo = list(section = "<para>some para</para>")))))
On a practical side, I don't think there's a compelling use case for anyone to either structure or parse text manually like in the problematic examples. Once you say section
, what follows is some text that should really be left as XML, or be imported directly from some other format (MS Word, Markdown rendering, etc, like we do in the EML package). I believe that turning each para
into it's own JSON key / list object would make this text harder to work with, not easier. Unlike the rest of EML, A bunch of <para>
items and <title>
items etc really can only be understood as XML and cannot and should not be interpreted as key-value pairs.
The difficulty is that textType embraces a whole bunch of DocBook that, unlike the rest of EML, cannot be expressed in key-value pairs. This is to me an excellent example of the fundamental philosophical difference between JSON and XML, XML is markup and can do stuff like <para> some <b>bold</b> text</para>
which has no analog translation into JSON, or RDF concepts for that matter. (or even an object-oriented S4, this problem also impacted the S4 version of the package). I think the main reason JSON is easier than XML to work with is precisely because JSON can't do markup, it can strictly only represent key-value pairs. In the emld
model, (indeed, in any RDF worldview) all textType content is just a 'value', it's not meant to be decomposed.
... hehe, wow, apparently I have more opinions on this thing than I realized. anyway, hope this helps some and happy to be convinced that we should change something.
Thank you very much for giving this some thought, @cboettig, these are excellent points. I think you are absolutely correct from the perspective of machine readability and interpretability. A challenge is when we seek to enhance or convey more human readable information within the constraints of XML/EML, which, of course, depends also on which and how EML components are interpreted and displayed (for example by the EDI data portal). @amoeba had suggested (wisely) in our Slack thread that Markdown would be an alternative and, in fact, better approach. I agree, and am very much looking forward to Markdown support in EML 2.2.
@srearl Note that you can already use Markdown as an input in EML 2.1.1 by letting the EML::setTextType
function translate the markdown into the XML tags.
I think this approach of separating out non-trivial markup text from the eml
construction bits is a bit cleaner to read than the above.
writeLines('
## General Protocols
Field methods. All experiments will be carried out in the greenhouse at Harvard Forest. We have developed an instrumentation system ....
Proteomic analysis. Proteomic profiles of microbial communities are determined after separating the microbial ...
## Specific Experiments
Experiment #1. Effects of nutrient enrichment on state changes and [O2] profiles. This experiment alters ...',
"section.md")
eml <-
list(additionalInfo = EML::set_TextType("section.md") )
And observe the XML we get back:
writeLines(as.character(
emld::as_xml(eml)
))
<?xml version="1.0" encoding="UTF-8"?>
<eml:eml xmlns:eml="eml://ecoinformatics.org/eml-2.1.1" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:stmml="http://www.xml-cml.org/schema/stmml-1.1" xsi:schemaLocation="eml://ecoinformatics.org/eml-2.1.1/ eml.xsd">
<additionalInfo>
<section>
<title>General Protocols</title>
<para>
Field methods. All experiments will be carried out in the greenhouse
at Harvard Forest. We have developed an instrumentation system ….
</para>
<para>
Proteomic analysis. Proteomic profiles of microbial communities are
determined after separating the microbial …
</para>
</section>
<section>
<title>Specific Experiments</title>
<para>
Experiment #1. Effects of nutrient enrichment on state changes and
[O2] profiles. This experiment alters …
</para>
</section>
</additionalInfo>
</eml:eml>
Thanks for the thoughts and example code, @cboettig. I can live with this and your explanation makes sense re: no good analog.
@srearl popped in to NCEAS EML Slack today with some weird emld behavior:
produces
Instead of the intended
I poked around at the source a bit and didn't quite see what's up but wanted to file an issue. I can look again later on this week I bet.