ropensci / RNeXML

Implementing semantically rich NeXML I/O in R
https://docs.ropensci.org/RNeXML
Other
13 stars 9 forks source link

Parsing of XML namespace declarations in XML input is incorrect #124

Closed hlapp closed 8 years ago

hlapp commented 8 years ago

Perhaps RNeXML has a built-in expectation for the abbreviation of the NeXML namespace? The following lines result in an error, complaining about the ns namespace:

<nexml xmlns="http://www.nexml.org/2009" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" version="0.9" xsi:schemaLocation="http://www.nexml.org/2009 http://www.nexml.org/2009/nexml.xsd http://www.bioontologies.org/obd/schema/pheno http://purl.org/phenoscape/phenoxml.xsd">
  <meta xmlns:ns="http://www.nexml.org/2009" xmlns:ter="http://purl.org/dc/terms/" xsi:type="ns:LiteralMeta" property="ter:creator"/>

This shouldn't result in an error. If one changes ns:LiteralMeta to nex:LiteralMeta (with nex not defined anywhere!), the error goes away. So this seems backwards to me - what shouldn't give an error does, and what should give an error doesn't.

What prompts this is that the Phenoscape API (cc @balhoff) apparently returns NeXML with the NeXML namespace defined (by a xmlns:ns="http://www.nexml.org/2009" attribute) from scratch for every single element for which it is needed, rather than once at the root element, and it always uses ns (which is legitimate, even if not pretty).

I have posted this over on the Phenoscape API too (phenoscape/phenoscape-kb-services#12) to declare this less verbose and using the standard prefix, but RNeXML should be able to read legitimately formatted NeXML.

For reference, here is an original file as returned by the Phenoscape API, and here is an edited file with the change throughout as per above. The former raises an error with nexml_read(), the latter doesn't (but arguably should).

cboettig commented 8 years ago

Yes, is due to issue #51.

I believe this is essentially because NeXML namespaces attribute values, which XML treats as strings. @hlapp's solution in #51 looks reasonable but has not yet been implemented.

balhoff commented 8 years ago

@hlapp in the next few weeks I will try to look into how to remove the redundant namespace declarations. I recall that it resulted from my workaround for the fact that the XML library I was using didn't automatically handle namespaces for attribute values.

hlapp commented 8 years ago

As an addition piece of information, here is the error that results from nexml_read() on the output returned by the Phenoscape API:

 Error in fromNeXML(new(type[1]), from) : 
  error in evaluating the argument 'obj' in selecting a method for function 'fromNeXML': Error in getClass(Class, where = topenv(parent.frame())) : 
  “ns:LiteralMeta” is not a defined class 

Traceback:

15 fromNeXML(new(type[1]), from) 
14 asMethod(object) 
13 FUN(X[[i]], ...) 
12 lapply(kids[names(kids) == "meta"], as, "meta") 
11 initialize(value, ...) 
10 initialize(value, ...) 
9 new("ListOfmeta", lapply(kids[names(kids) == "meta"], as, "meta")) 
8 .nextMethod(obj = obj, from = from) 
7 callNextMethod() 
6 fromNeXML(new("nexml"), from) 
5 fromNeXML(new("nexml"), from) 
4 asMethod(object) 
3 as(xmlRoot(doc), "nexml") 
2 nexml_read("./inst/examples/test_original.xml") at pk_ontotrace.R#56
1 rphenoscape::test_nexml() 
balhoff commented 8 years ago

I was wrong that it was a workaround I did. I use the XMLbeans API to write the documents. It looks like it always redundantly declares a namespace for writing QName attribute values (it's safer). I know how to specify namespace prefixes for the whole document, but haven't yet found a way to specify which prefixes I want for these QName values.