ropensci / EML

Ecological Metadata Language interface for R: synthesis and integration of heterogenous data
https://docs.ropensci.org/EML
Other
97 stars 33 forks source link

get_section(), docbook -> html? #149

Closed cboettig closed 8 years ago

cboettig commented 8 years ago

In #137 we discuss generating EML docbook from an external file, which is implemented in set_section() now. It might be convenient to run this in reverse, allowing a user to render a methods section extracted from an EML file into HTML or other format? Would be easy to have an analogous get_section() function for this, using pandoc.

Also, even though we are working with element named "section" here, perhaps section isn't the best name for these functions? Any other suggestions?

cboettig commented 8 years ago

Above commit provides a candidate implementation for this, which can render a TextType node as HTML, Word, markdown, etc using rmarkdown & pandoc, as a reverse of the implementation of the set_TextType() function.

By default uses HTML and calls the R browse function, which serves as a more convenient way to read a TextType() node in an EML document from within R, e.g.

f <- system.file("examples/hf205-abstract.docx", package = "EML")
a <- as(set_TextType(f), "abstract")
get_TextType(a)
cboettig commented 8 years ago

fn tested, documented & exported now, feedback welcome.

maelle commented 8 years ago

Are there already functions for doing the contrary, i.e. from the eml to a sort of human-readable docbook?

cboettig commented 8 years ago

EML TextType elements like abstract and methods are based on a subset of docbook, so rendering then is natural. One could display the rest of the EML as, say, pretty HTML, but exactly how is a little subjective. Take a look at how KNB displays the EML metadata for any dataset in its repository, for instance, for a very beautiful example.

We aim to make it easy to publish EML (publicly or privately) to the KNB , which would this automatically provide a pretty HTML version of the metadata, along with other benefits.

mbjones commented 8 years ago

@masalmon For multiple of our applications, we use XSLT to convert EML to an HTML rendering, and then use CSS to format that. The XSLT converter scripts ship with the EML distribution. For the KNB, DataONE, Arctic Data Center, and other repositories, we take this basic HTML DOM created from the XSLT conversion, and then decorate it using javascript functions that access multiple other web services with information related to the EML (for example, download counts for all of the data files in the EML package). Thus, the EML display you see on the KNB is a complex, multi-stage rendering that starts with the EML document, converts it to simple HTML, styles it with CSS, and then modifies that DOM using javascript. The code for doing that is all open source and part of our MetacatUI product. It has a fairly deep understanding of what EML represents.

maelle commented 8 years ago

Thanks @cboettig & @mbjones! I'll use eml for data that won't be published (or maybe only later) so i'll look at the repositories for inspiration for making local files rendering the eml.

maelle commented 7 years ago

I see I had asked about rendering before, so even without rendering this https://github.com/ropensci/EML/issues/189 can help seeing the eml one has created.