Closed cboettig closed 5 years ago
Is the idea here that we embed the semantics in the meta tags? Or would be be able reference the semantics elsewhere in a separate file?
@emhart Both. The above gets embedded in the EML in an additionalMetadata
element. Then we can parse it as XML content when we read.eml
, or we can pipe the whole darned EML file through an RDFa distiller, and out will come an RDF version of this metadata (in whatever format we want: N3, turtle, etc, but we'll use RDF-XML to illustrate). We can then explore that with whatever semantic tools we have handy to chew on RDF. For instance, I illustrate both XML parsing and RDF SPARQL queries on a minimal EML file in this example/test:
https://github.com/ropensci/reml/blob/0ac91203026779f3a89d5cc42470faaab879a82a/inst/tests/test_semantics.R (updated link, fixed first xpath query)
enjoy!
@emhart The semantics are implied due to the namespace association. The "o:" prefix is linked to the OBOE namespace, which defines the semantics of the properties.
@cboettig Regarding your second point, we have an XSLT as part of Metacat that can do a minimal EML->RDF translation, which was used in some early work on supporting LSIDs. Its minimal, but a decent starting point.
@emhart After re-reading your comment, I think I misinterpreted your question.
@mbjones awesome. yeah, minimal is fine, it might be a nice proof-of-concept to include along with our other stylesheets and see what users find most useful. Can you link me to the XSLT? With an increasing number of resources being available in RDF that particular case becomes more compelling. Doesn't dataone have an associated triplestore? At this stage I imagine the SOLR queries are more useful, but who knows.
Also, @emhart might have mentioned to you that he and I have had some discussions about his work at NEON and in providing EML for some of their data products. It sounds like a semantically enhanced EML could be particularly promising in that case.
I'm still getting my head wrapped around SPARQL queries but the ability to do these from R, as in my example linked above, is a nice touch.
@cboettig As I said, its rudimentary, and is no longer maintained, so its out of date wrt the current EML version. But you can find the XSLT here: https://code.ecoinformatics.org/code/metacat/trunk/lib/lsid_conf/eml-2.0.1.xslt
Okay, decided I might learn some really basic XSLT by writing a style file to pull some standard dublin core-type terms from /eml/dataset and provide them as RDF. No idea if this implementation would really be best-practice, but fun learning exercise anyhow.
require(Sxslt)
infile <- system.file("examples", "hf205.xml", package = "reml")
xsltApplyStyleSheet(infile, "inst/xsl/eml211_to_rdf.xsl")
dc:
namespace definition to appear in one of the parent nodes, XSLT adds it to each dc:
prefixed element explicitly. Could easily be extended and could no doubt be improved upon.
FYI, we also have a minimal EML -> DC XSLT that is used to produce DC RDF for our Metacat OAI-PMH implementation. See https://code.ecoinformatics.org/code/metacat/trunk/lib/oaipmh/
Might be useful to you for comparison.
@mbjones those are beautiful! Yes, nice to how that's done. I think these stylesheets could be potentially useful to reml users seeing to do some triples extraction and manipulation of a bunch of xml with the rrdf
package, for instance, so I'll include them in the xsl collection.
Ultimately would be nice to include things beyond the Dublin Core to where we can do more with the semantics (I realize that's outside of the OAI-PMH use case for the XSLT you link, but just as an extension). For instance, it might be nice to add taxonomicCoverage in terms such as the VTO. Then one could construct a sparql query to say things like "give me all datasets covering frogs".
This issue builds on the ideas discussed in #5. We can broadly separate out three use-cases for RDFa annotations: