Open cboettig opened 4 years ago
Good catch. Agree on the current implementation being a bit error-prone. EML 2.2.0 got Markdown support in part because of its popularity and ubiquity but also because the pre-2.2.0 TextType module was really hard to use. I think defaulting to a Markdown-first experience from here on out is a good idea.
Another API it'd be nice to support would be
list(methodStep = list( # or set_methods(markdown = ) instead
list(description = list(
markdown = "# Header
Example text.
"))))
so people can keep everything in the same R/Rmd.
@jeanetteclark are you or others on the Data Team using markdown for your TextType elements? @clnsmth, @earnaud?
@amoeba not usually
Thanks for reaching out @amoeba @cboettig. I'm mostly using .docx because that's the preferred file type our data contributors submit their abstract and methods to us. Sadly, I'm using it to remove most of the text formatting before sending it to set_TextType()
to bypass the time trouble shooting .docx > pandoc > EML translation issues.
I very much welcome improved support for .md > EML (.xml) and many thanks in advance to whoever takes this on!
@clnsmth Thanks! that's a great point. I think we should probably replace the existing mechanism which uses pandoc to convert docx
and .md
into docbook (since too much of modern docbook is just not EML compatible anyway). We can have it instead use pandoc to render .docx files into markdown, and pass-through .md
markdown (at least so long as we are in eml-2.2.0
mode). That way, users can continue to rely on .docx
as their interface but we'd avoid the myriad docbook translation issues.
I don't have any good solution for what to do about this in eml-2.1.1
mode though where we can't generate <markdown>
sections. We could either leave it as is, or perhaps convert to markdown but encode it if it were just a plain text block. This would at least avoid the validation errors. (and we could encourage folks to adopt 2.2.0!)
Thanks @cboettig. I think I’m on board with this but have a clarifying question.
In the proposed changes, will at least the current level of support for .txt
, .docx
, .md
> .xml
be maintained when EML
is operating in eml-2.2.0
mode? This is particularly important for tooling I’ve developed for the non-technical user and would be sad to see it go away : (. Additionally, continued support for .xml
when operating in eml-2.2.0
mode supports data repositories and other metadata consumers using EML 2.2.0 but not ready/wanting to handle <makdown>
sections.
Thanks @clnsmth for the feedback, that's good to know. I'm a bit wary of the transforms into docbook xml
because they are currently not guaranteed to produce valid EML (basically depending on what kind of markup the user uses), but if they are useful we could certainly make the migration in a non-breaking way instead. It is mostly a matter of figuring out what syntax to use. e.g. perhaps we need a toggle-parameter like:
set_TextType("myfile.docx", as = "markdown")
set_methods("mymethods.docx", as = "markdown")
etc to encode in markdown, and maybe:
set_TextType("myfile.docx", as = "xml")
set_methods("mymethods.docx", as = "xml")
to encode as docbook. we could haggle over whether markdown
or xml
would be the default. (Making xml
the default would obviously be the backwards-compatible option, but I think it also seems to be creating a somewhat unnecessary gotcha for new users).
Thoughts?
Thanks again for the feedback!
p.s. accidentally posted the above half-way through writing it, so if you're following via email, see updated thread on GitHub. apologies.
Good point on backwards compatibility, @clnsmth. I think the ideas here are to maintain the ability to bring .txt
, .docx
, and .md
docs into EML via this package but change what happens. The difference is what happens depending on the EML version (emld_db
):
emld_db = 2.1.1 or <
: Convert txt, md, docx, etc. to DocBook sections. Current behavior.emld_db = 2.2.0 or >
: Convert txt, md, docx, etc. to Markdown and place into <markdown>
elements by default.The current function signature of set_TextType
is:
function(file = NULL, text = NULL)
and I think we can get all of this working with a change to
function(file = NULL, text = NULL, markdown = TRUE)
The function will retain the first part of its signature and the added markdown
argument controls whether the function makes use of the EML 2.2.0 markdown
element, defaulting to TRUE
. Here's a confusion matrix to get an idea:
For any file=
input, docx
, md
, or otherwise:
markdown= |
emld_db |
Result |
---|---|---|
TRUE |
<=2.1.1 |
DocBook w/ warning |
FALSE |
<=2.1.1 |
DocBook |
TRUE |
>=2.2.0 |
<markdown> |
FALSE |
>=2.2.0 |
DocBook |
Note: When using the text
argument instead of file
, I think you'd still always get the current behavior of setting a simple text type element instead of DocBook-esque or Markdown.
Does this sound workable?
PS: @clnsmth , you said:
Additionally, continued support for .xml when operating in eml-2.2.0 mode supports data repositories and other metadata consumers using EML 2.2.0 but not ready/wanting to handle
sections.
Did you mean "continued support for our mini-DocBook, aka section
, and para
"? If the default behavior of set_TextType("foo")
changed such that it output an <markdown
by default, would that be a deal-breaker?
Thanks @cboettig @amoeba for considering my use case and these clarifying explanations. I'm completely on-board with changing the default behavior to markdown and retaining backwards compatibility through some permutation of arguments while emld_db = 2.2.0
.
I don't think we have a very good way for users to supply markdown input. (The current examples showing support for markdown files still use the 2.1.1 strategy of converting to docbook via pandoc, which is error-prone as a EML doesn't support all docbook terms).
We can get valid 2.2.0 markdown 'manually', e.g. something like:
but it would be much better to support something closer to the current sytnax in the example, e.g. maybe:
(Functions would also need a check to make sure we were in 2.2.0 mode).
Anyone generating EML with markdown sections currently? Do you do something like the above? Any takers on a PR on this one?