ropensci / RNeXML

Implementing semantically rich NeXML I/O in R
https://docs.ropensci.org/RNeXML
Other
13 stars 9 forks source link

nexml_validate issues warnings on validating valid NeXML #128

Closed hlapp closed 8 years ago

hlapp commented 8 years ago

Here's the log:

> nexml_validate("./inst/examples/test_original.xml")
[1] FALSE
Warning message:
In nexml_validate("./inst/examples/test_original.xml") :
  Validation failed, error messages: 
                    'http://purl.obolibrary.org/obo/VTO_0036225' is not a valid xml NCName for Bio::Phylo::Taxa::Taxon=SCALAR(0x1ce5440)            Validation failed, error messages: 
                    'http://purl.obolibrary.org/obo/VTO_0036225' is not a valid xml NCName for Bio::Phylo::Taxa::Taxon=SCALAR(0x1ce5440)

The file is the original NeXML returned by the Phenocape API.

One thing that might come into play here is the use of HTTP URIs as local identifiers. I have filed issue phenoscape/phenoscape-kb-services#15 for whether this is on purpose, and what motivates it.

cboettig commented 8 years ago

Right, but these errors come straight from the nexml.org validator, and I get the same errors when I just upload the file to the browser, so this is not an RNeXML issue.

Incidentally, I'm getting validation errors from the nexml.org validator when I try with no namespaces on the elements as well (just as we discussed), even though I would have thought the default namespace would be inferred. @rvosa did I misunderstand something here? Possibly this is a bug on the validator?

rvosa commented 8 years ago

These errors are not mysterious at all: XML identifiers (that is, id attributes that can be referenced elsewhere in a document) have to be "non-colonized names" (NCName). URIs are not suited for this: they contain colons, as well as other characters that (AFAIK) are also not allowed under the production rules for NCNames (namely, the forward slashes).

On Wed, Sep 30, 2015 at 11:37 PM, Hilmar Lapp notifications@github.com wrote:

Here's the log:

nexml_validate("./inst/examples/test_original.xml") [1] FALSEWarning message:In nexml_validate("./inst/examples/test_original.xml") : Validation failed, error messages: 'http://purl.obolibrary.org/obo/VTO_0036225' is not a valid xml NCName for Bio::Phylo::Taxa::Taxon=SCALAR(0x1ce5440) Validation failed, error messages: 'http://purl.obolibrary.org/obo/VTO_0036225' is not a valid xml NCName for Bio::Phylo::Taxa::Taxon=SCALAR(0x1ce5440)

The file https://github.com/xu-hong/rphenoscape/blob/master/inst/examples/test_original.xml is the original NeXML returned by the Phenocape API.

One thing that might come into play here is the use of HTTP URIs as local identifiers. I have filed issue phenoscape/phenoscape-kb-services#15 https://github.com/phenoscape/phenoscape-kb-services/issues/15 for whether this is on purpose, and what motivates it.

— Reply to this email directly or view it on GitHub https://github.com/ropensci/RNeXML/issues/128.

rvosa commented 8 years ago

Incidentally, I'm getting validation errors from the nexml.org validator when I try with no namespaces on the elements as well (just as we discussed), even though I would have thought the default namespace would be inferred. @rvosa https://github.com/rvosa did I misunderstand something here? Possibly this is a bug on the validator?

It is impossible to say without seeing the input file and the log. Apart from the integrity checks about whether the right blocks are referring to each other (which you can't really express in XML Schema) the validation is for the most part a totally generic XML Schema validation that involved essentially zero coding on my end, so the scope for bugs there is probably limited.

hlapp commented 8 years ago

The XML isn't valid. See @balhoff's comments on phenoscape/phenoscape-kb-services#15

cboettig commented 8 years ago

@rvosa @hlapp @balhoff Thanks, that makes perfect sense in the case of the phenoscape example.

I'm still a little confused by the validation with respect to having namespace prefixes likenex: on the values of attributes (particularly the value of xsi:type attributes. For instance, this NeXML file is valid by the online validator, and it uses bare xsi:type values in meta elements, e.g. it uses:

 <meta xsi:type="LiteralMeta"

instead of

 <meta xsi:type="nex:LiteralMeta"

However, when I remove the nex: prefixes from this other valid NeXML character xsi:type values, it stops being valid. Why? Why isn't the top level namespace inferred automatically?

If I understood from recent discussion, we felt that it was best to ignore these prefixes until we could expand them properly, and when generating XML to omit them for compatibility. Maybe I got that wrong.

rvosa commented 8 years ago

On Thu, Oct 1, 2015 at 4:42 PM, Carl Boettiger notifications@github.com wrote:

@rvosa https://github.com/rvosa @hlapp https://github.com/hlapp @balhoff https://github.com/balhoff Thanks, that makes perfect sense in the case of the phenoscape example.

I'm still a little confused by the validation with respect to having namespace prefixes likenex: on the values of attributes (particularly the value of xsi:type attributes. For instance, this NeXML file https://github.com/ropensci/RNeXML/blob/master/inst/examples/meta_example.xml is valid by the online validator, and it uses bare xsi:type values in meta elements, e.g. it uses:

<meta xsi:type="LiteralMeta"

instead of

<meta xsi:type="nex:LiteralMeta"

However, when I remove the nex: prefixes from this other valid NeXML https://github.com/ropensci/RNeXML/blob/master/inst/examples/characters.xml character xsi:type values, it stops being valid. Why? Why isn't the top level namespace inferred automatically?

I wonder what would happen if you removed the xsi:schemaLocation attribute from the file that fails if the xsi:type is not fully qualified. In your former case (the file that succeeds whether or not there is a prefix) we don't actually say anywhere explicitly where the schema is located - though the validator knows, on the basis of the namespace URI. In the latter, we do give a schema location. Perhaps the validator tries (and fails) to do something with that in the case of the default namespace?

balhoff commented 8 years ago

@hlapp XML coming out of OntoTrace should validate completely now (since 2015-10-5). Please let me know if you encounter any problems.

cboettig commented 8 years ago

Should be fixed by PR #133