ropensci / EML

Ecological Metadata Language interface for R: synthesis and integration of heterogenous data
https://docs.ropensci.org/EML
Other
98 stars 33 forks source link

Support for older EML formats and other metadata via XSLT tranform #100

Closed cboettig closed 5 years ago

cboettig commented 10 years ago

For instance, eml-2.0.1 additionalMetadata has additionalInformation child node, not in a metadata node, which just breaks, for instance in: knb-lter-sev.17804.1.xml

cboettig commented 10 years ago

@mbjones Suggested strategies for handle this? (e.g. is there an XSLT stylesheet we could use to first 'promote' the old format? (If it's just this a few changes, might be best for us to just handle the mapping explicitly -- e.g. map an additionalInformation node into our metadata S4 object)

mbjones commented 8 years ago

@cboettig Yes, we have XSLT style sheets to forward migrate EML versions that we use in Morpho. There are some breaking changes between 2.0.1 and 2.1.0 that require user input to forward migrate, for which we have a wizard in Morpho. See https://code.ecoinformatics.org/code/eml/trunk/style/

cboettig commented 8 years ago

@mbjones Not sure if this is something that is definitely in scope for a first release or could be left to the user meanwhile to transform into 2.1.0 first. The trouble is that unfortunately we still don't have very robust support for XSLT transformations from within R; there's just the Omegahat Sxslt package which hasn't been super stable to install.

Once we had such a utility we could apply the stylesheet automatically before trying to read in an older version, unless the automatic conversion is inadvisable.

Right now the parser doesn't check the version before attempting to convert the XML to S4, which means that some compatible 2.0.1 files could be read in, but I suppose we should at least warn on such files?

cboettig commented 7 years ago

@mbjones does the last commit effectively close this? Technically the S4 classes are still developed only against the current schema, so some things in older schema formats could fail to parse correctly, right? (Eg i think nonstandard units & additionalMetadata elements look diferent?). Not sure if this is really something we should address or just document though.

There's no stable xslt R package right now (though @jeroenooms might be adding a PR to xml2 for this) so can't really go that route. Might consider adding xslt-basdd features down the road if that changes.

jeroen commented 7 years ago

We decided to implement xslt in a seperate xslt package. It is ready to use but we have to wait with submitting to CRAN until the new version of xlm2 is on CRAN. But you can start using it already, it should be on CRAN soon.

devtools::install_github("jeroenooms/xslt")
cboettig commented 7 years ago

@jeroenooms awesome, thanks. (yup, a second package makes more sense; no one wants to be surprised that suddenly libxslt-dev is an additional external dependency for xml2 when it wasn't before)

jeroen commented 7 years ago

Let me know if this package words for you. I have only tested it with some hello world examples because I don't really use xsl myself.