ropensci / RNeXML

Implementing semantically rich NeXML I/O in R
https://docs.ropensci.org/RNeXML
Other
13 stars 9 forks source link

Ontotrace Example is not valid NeXML? #164

Closed cboettig closed 6 years ago

cboettig commented 6 years ago

@rvosa the ontotrace example, https://github.com/ropensci/RNeXML/blob/master/inst/examples/ontotrace-result.xml is failing on NeXML validator. Apparently the id on the otus block is not a NCName?

rvosa commented 6 years ago

I'm fairly sure the only allowed non-alphanumeric character in NCNames is the underscore. No hyphens.

On Fri, Dec 15, 2017 at 11:15 PM, Carl Boettiger notifications@github.com wrote:

@rvosa https://github.com/rvosa the ontotrace example, https://github.com/ropensci/RNeXML/blob/master/inst/ examples/ontotrace-result.xml is failing on NeXML validator. Apparently the id on the otus block is not a NCName?

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/ropensci/RNeXML/issues/164, or mute the thread https://github.com/notifications/unsubscribe-auth/AAGf-hJuTG3XBoGxd6j0DEmuDn7tgGLTks5tAu9wgaJpZM4REEEH .

cboettig commented 6 years ago

@rvosa Are you sure? I would have thought - was okay, just not stuff like : or /. These are UUIDs in the Ontotrace, right?

I'm just looking at https://stackoverflow.com/questions/1631396/what-is-an-xsncname-type-and-when-should-it-be-used

rvosa commented 6 years ago

Mmmm... maybe I'm getting this wrong. I guess it is no hyphens as the first character?

On Sat, Dec 16, 2017 at 7:49 PM, Carl Boettiger notifications@github.com wrote:

@rvosa https://github.com/rvosa Are you sure? I would have thought - was okay, just not stuff like : or /. These are UUIDs in the Ontotrace, right?

I'm just looking at https://stackoverflow.com/ questions/1631396/what-is-an-xsncname-type-and-when-should-it-be-used

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/ropensci/RNeXML/issues/164#issuecomment-352203136, or mute the thread https://github.com/notifications/unsubscribe-auth/AAGf-pv9G2U6V5Mfh9i3UJ7Tsa5QplwJks5tBBDRgaJpZM4REEEH .

cboettig commented 6 years ago

ah, I think it is no numbers as the first symbol, which is violated by the block that follows: https://github.com/ropensci/RNeXML/blob/master/inst/examples/ontotrace-result.xml#L37

cboettig commented 6 years ago

and apparently we already discovered this circa 2013... https://github.com/ropensci/RNeXML/issues/14#issuecomment-29428306 whoops!

@hlapp I don't suppose phenoscape is going to do anything about its habit of using UUID ids that are sometimes not valid NCNames?

hlapp commented 6 years ago

I myself am not a big fan of UUIDs but I think programming wise they're one of the best if not the best choice available for the purpose they are generated for in Phenoscape.

However, it would seem that the issue of them leading to sometimes invalid NCName-type identifiers is a bug, and not hard to circumvent. For example, we could just always prefix them (e.g., with U or U-).

CC @balhoff - should we file a bug in the respective Phenoscape repo, and which one would that be? phenoscape-kb-services

balhoff commented 6 years ago

I had fixed this a while back but seems I missed one place where a UUID is used. This should be corrected by https://github.com/phenoscape/phenoscape-kb-services/commit/4bc624b106d86f8c9310ae6b6c81172b2ed3508c.

balhoff commented 6 years ago

That should fix IDs in OntoTrace downloads (please let me know if you see that it's not the case). There are some existing annotation files (not from OntoTrace) in our data repository which have some bad IDs. I might need to just find and replace all those manually.

balhoff commented 6 years ago

I might need to just find and replace all those manually.

That's done now also.

cboettig commented 6 years ago

Nice Jim, you're fast! Thanks!

hlapp commented 6 years ago

👏

cboettig commented 6 years ago

p.s. one day I'd like to hear @hlapp 's take on UUIDs.

One thing that's been annoying with JSON-LD @ids is that they are all URIs with protocol included, meaning that http://schema.org/name isn't the same as https://schema.org/name as far as the algorithms are concerned...

balhoff commented 6 years ago

Nice Jim, you're fast! Thanks!

Sort of. I've known about this problem for years. 😆