ropensci / RNeXML

Implementing semantically rich NeXML I/O in R
https://docs.ropensci.org/RNeXML
Other
13 stars 9 forks source link

Fixes to get_characters #126

Closed sckott closed 8 years ago

sckott commented 8 years ago

125

These changes fix get_characters() so that the fxn works more generally, with associated change to namespace map

hlapp commented 8 years ago

Am I right that this can only be expected to fix the character matrix extraction from the XML? @xu-hong and I continue to see the same issue with nexml_read() failing to read the NeXML emitted by the Phenoscape API.

The character matrix extraction now succeeds if the Phenoscape API output is manually modified to use nex:LiteralMeta instead of ns:LiteralMeta etc.

FYI, the error from nexml_read() is the following:

 Error in fromNeXML(new(type[1]), from) : 
  error in evaluating the argument 'obj' in selecting a method for function 'fromNeXML': Error in getClass(Class, where = topenv(parent.frame())) : 
  “ns:LiteralMeta” is not a defined class 
cboettig commented 8 years ago

You're correct. We need to go through the code and replace all the hard-wired namespaces on attribute elements with the same fix of at least stripping the namespace (though ultimately we should rather be expanding it based on the declaration, but that gets messy very quickly since it cannot be handled by the existing XML namespace logic, as it is defined only at the schema level)

hlapp commented 8 years ago

@cboettig am I right that if the Phenoscape API returned NeXML with nex: as the namespace for attribute values that that should then work out of the box and not be dependent on when you get around to make this fix? cc @balhoff

We're trying to determine how we can best make progress here because this is now holding up rphenoscape. I'd love if we don't have to go to the extent of subjecting returned NeXML to wholesale string substitution - that'd be pretty messy.

balhoff commented 8 years ago

@hlapp I will look at how I might switch to nex: ASAP.

cboettig commented 8 years ago

Yes that should fix it

On Wed, Sep 30, 2015, 2:16 PM Jim Balhoff notifications@github.com wrote:

@hlapp https://github.com/hlapp I will look at how I might switch to nex: ASAP.

— Reply to this email directly or view it on GitHub https://github.com/ropensci/RNeXML/pull/126#issuecomment-144545441.

http://carlboettiger.info

balhoff commented 8 years ago

So do the attribute values need to have the nex: prefix? Ideally they wouldn't have any prefix at all since the default namespace of the whole document is NeXML. The schema defines these as QNames, for which I believe the prefix is optional (so that you can use the default namespace).

http://www.w3.org/TR/1999/REC-xml-names-19990114/#dt-qname

cboettig commented 8 years ago

I agree, ideally no namespace at all. Some have hardwired nex namespace right now, I should be able to fix that as soon as I get a free moment at a computer. Sorry to be the slow cog

On Wed, Sep 30, 2015, 6:30 PM Jim Balhoff notifications@github.com wrote:

So do the attribute values need to have the nex: prefix? Ideally they wouldn't have any prefix at all since the default namespace of the whole document is NeXML. The schema defines these QNames, for which I believe the prefix is optional (so that you can use the default namespace).

http://www.w3.org/TR/1999/REC-xml-names-19990114/#dt-qname

— Reply to this email directly or view it on GitHub https://github.com/ropensci/RNeXML/pull/126#issuecomment-144590285.

http://carlboettiger.info

cboettig commented 8 years ago

Gave this a try on drop-nex branch, but hitting validator issues on my tests that seem to be related to the absence of namespaces on these attributes (now that I fixed tests on master, sorry about that too). see #128

Validation failed, error messages: 
                    Can't locate Bio/Phylo/Matrices/Datatype/Continuouscells.pm in @INC (@INC contains: ../site/lib ../../bio-phylo/lib ../../perllib/arch ../../perllib /usr/local/lib64/perl5 /usr/local/share/perl5 /usr/lib64/perl5/vendor_perl /usr/share/perl5/vendor_perl /usr/lib64/perl5 /usr/share/perl5 .) at ../../bio-phylo/lib/Bio/Phylo/Util/CONSTANT.pm line 242, <DATA> line 1.
rvosa commented 8 years ago

Mmmm... this actually does look like a bug on the end of the validator, that is, the part that does the reference checking (which I had to code "by hand"). The hand-coded part of the validator - wrongly - expects that xsi:type statements always have a namespace prefix.

On Thu, Oct 1, 2015 at 5:15 AM, Carl Boettiger notifications@github.com wrote:

Gave this a try on drop-nex branch, but hitting validator issues on my tests that seem to be related to the absence of namespaces on these attributes (now that I fixed tests on master, sorry about that too). see

128 https://github.com/ropensci/RNeXML/issues/128

Validation failed, error messages: Can't locate Bio/Phylo/Matrices/Datatype/Continuouscells.pm in @INC (@INC contains: ../site/lib ../../bio-phylo/lib ../../perllib/arch ../../perllib /usr/local/lib64/perl5 /usr/local/share/perl5 /usr/lib64/perl5/vendor_perl /usr/share/perl5/vendor_perl /usr/lib64/perl5 /usr/share/perl5 .) at ../../bio-phylo/lib/Bio/Phylo/Util/CONSTANT.pm line 242, line 1.

— Reply to this email directly or view it on GitHub https://github.com/ropensci/RNeXML/pull/126#issuecomment-144604937.

rvosa commented 8 years ago

Well, mmmm..., wait a sec. How is the type name written? It needs to be CamelCase, i.e. ContinuousCells. Is it possible that this is not done correctly?

cboettig commented 8 years ago

I think the name is correct, all that changed is the removal of the prefix. I put links to example XML files in #128 where I just removed the prefix and the validator gets unhappy. It does look like the validator gives the error message in not-quite camelCase even though the original file is camelCased. (sorry to cross my threads, but see #128)

rvosa commented 8 years ago

Ok, then there is a bug in the parser on the server....

On Thu, Oct 1, 2015 at 4:50 PM, Carl Boettiger notifications@github.com wrote:

I think the name is correct, all that changed is the removal of the prefix. I put links to example XML files in #128 https://github.com/ropensci/RNeXML/issues/128 where I just removed the prefix and the validator gets unhappy. It does look like the validator gives the error message in not-quite camelCase even though the original file is camelCased. (sorry to cross my threads, but see #128 https://github.com/ropensci/RNeXML/issues/128)

— Reply to this email directly or view it on GitHub https://github.com/ropensci/RNeXML/pull/126#issuecomment-144750904.

rvosa commented 8 years ago

I applied a fix to the parser so that it now accepts xsi:type attributes without namespace prefixes: https://github.com/rvosa/bio-phylo/commit/ec6ab9446a133e3d0943cdd565736de1f13bf1da

cboettig commented 8 years ago

Nice, thanks. Is that live on the online validator now?

rvosa commented 8 years ago

Not yet, I'm afraid. I'm in a bit of a bind with this: nexml.org is still being hosted on a nescent server where I can't expect a great deal of support (for obvious reasons). What I really should be doing is reinstall the website on a naturalis server. All the arrangements for that are in place but it'd take a bit more time than I have today. Can we live with the fact that non-prefixed xsi:type attributes should validate, which they do locally, but that this will take till some time next week to become live?

On Thu, Oct 1, 2015 at 6:51 PM, Carl Boettiger notifications@github.com wrote:

Nice, thanks. Is that live on the online validator now?

— Reply to this email directly or view it on GitHub https://github.com/ropensci/RNeXML/pull/126#issuecomment-144784656.

hlapp commented 8 years ago

Validation isn't mandatory to succeed for reading a file to succeed so I suppose this isn't a hold up.

cboettig commented 8 years ago

@rvosa No problem, no rush from me, I just wanted to make sure that the errors I still got were not due to something else I screwed up.

@hlapp Meanwhile, can you just try the drop-nex branch? Like I mentioned, it's still failing some tests, I think all due to this but maybe not; I haven't had time myself to debug more. Hopefully it should fix all those issues.

hlapp commented 8 years ago

@hlapp Meanwhile, can you just try the drop-nex branch? Like I mentioned, it's still failing some tests, I think all due to this but maybe not; I haven't had time myself to debug more. Hopefully it should fix all those issues.

This should go to @xu-hong. I won't have time before Wednesday next week.

rvosa commented 8 years ago

@rvosa https://github.com/rvosa No problem, no rush from me, I just wanted to make sure that the errors I still got were not due to something else I screwed up.

Well, I guess I can't say for sure ;-)

But errors that spit out messages such as "Can't locate Bio/Phylo/Matrices/Datatype/XXX.pm in @INC..." are ones that are triggered because the fix hasn't gone live yet.

xu-hong commented 8 years ago

@cboettig @hlapp
Hi I tried the drop-nex branch and test nexml_read() against the NeXML returned from the Phenoscape API, stored in this file. And I am still getting the same error:

Error in fromNeXML(new(type[1]), from) : 
  error in evaluating the argument 'obj' in selecting a method for function 'fromNeXML': Error in getClass(Class, where = topenv(parent.frame())) : 
  “ns:LiteralMeta” is not a defined class
cboettig commented 8 years ago

Thanks @xu-hong, I'll try and take a look soon.

cboettig commented 8 years ago

@xu-hong my bad, looks like I hadn't committed and pushed a change. Can you try again?

rvosa commented 8 years ago

I have now deployed the validator (and most of the rest of the website) to the new host. For now I haven't updated the DNS records yet so you'll have to use http://162.13.187.155/

Please let me know how you get on. I've tested the thing with the xsi:type not needing a prefix and that now validates correctly so all should be well.

If you get the chance to click some other buttons on the website to see if the deployment all worked I would be very grateful. I think there are two bad bits for now:

  1. I haven't tested any of the other (conversion) service, mostly because I don't think people are using them so I didn't bother
  2. for the schema documentation some of the menu items on the right hand side still have the old URLs. I'd have to regenerate the static HTML to fix that.
cboettig commented 8 years ago

@rvosa The new validator has been great, lemme know when the DNS records are updated, I've hardcoded the ip in the validator function temporarily.

rvosa commented 8 years ago

The DNS records are updated, the original url(s) are now in effect for the updated validator: http://www.nexml.org/nexml/phylows/validator - of this url, the following fragments are actually optional: the www can be omitted, and the nexml path fragment also, e.g. http://nexml.org/phylows/validator works.

rvosa commented 8 years ago

Oh, and the issues I mentioned here have now been addressed: the other services work as intended, and the links in the right hand side menu have been updated.

cboettig commented 8 years ago

Awesome, thanks!