samvera-deprecated / rdf-vocab

Shared RDF Vocabularies (rdf-vocab <= v0.7.0)
https://github.com/ruby-rdf/rdf-vocab
Apache License 2.0
4 stars 2 forks source link

Attempting to add Darwin Core vocabulary - error #35

Closed jhallida closed 9 years ago

jhallida commented 9 years ago

Hi, we are attempting to add Darwin Core metadata for a Hydra project we are working on here at Indiana University. My addition to the vocab.yml file is as follows:

dwc: class_name: DWC uri: http://rs.tdwg.org/dwc/terms/ source: http://rs.tdwg.org/dwc/rdf/dwcterms.rdf

The error when running the rake task to generate the DWC vocab file is: "root must be a proxy not a NilClass". I'm sure I'm just missing something fundamental here but I'm not sure what it is.

dchandekstark commented 9 years ago

I get the same thing -- it could be a bug in rdf-rdfxml? @no-reply @barmintor @terrellt

root must be a proxy not a NilClass
/Users/dc/.rvm/gems/ruby-2.1.2@rdf-vocab/gems/rdf-rdfxml-1.1.3/lib/rdf/rdfxml/reader.rb:186:in `each_statement'
/Users/dc/.rvm/gems/ruby-2.1.2@rdf-vocab/gems/rdf-1.1.9/lib/rdf/mixin/writable.rb:129:in `insert_statements'
/Users/dc/.rvm/gems/ruby-2.1.2@rdf-vocab/gems/rdf-1.1.9/lib/rdf/mixin/mutable.rb:53:in `block in load'
/Users/dc/.rvm/gems/ruby-2.1.2@rdf-vocab/gems/rdf-rdfxml-1.1.3/lib/rdf/rdfxml/reader.rb:165:in `call'
/Users/dc/.rvm/gems/ruby-2.1.2@rdf-vocab/gems/rdf-rdfxml-1.1.3/lib/rdf/rdfxml/reader.rb:165:in `block in initialize'
/Users/dc/.rvm/gems/ruby-2.1.2@rdf-vocab/gems/rdf-1.1.9/lib/rdf/reader.rb:207:in `instance_eval'
/Users/dc/.rvm/gems/ruby-2.1.2@rdf-vocab/gems/rdf-1.1.9/lib/rdf/reader.rb:207:in `initialize'
/Users/dc/.rvm/gems/ruby-2.1.2@rdf-vocab/gems/rdf-rdfxml-1.1.3/lib/rdf/rdfxml/reader.rb:140:in `initialize'
/Users/dc/.rvm/gems/ruby-2.1.2@rdf-vocab/gems/rdf-1.1.9/lib/rdf/reader.rb:148:in `new'
/Users/dc/.rvm/gems/ruby-2.1.2@rdf-vocab/gems/rdf-1.1.9/lib/rdf/reader.rb:148:in `block in open'
/Users/dc/.rvm/gems/ruby-2.1.2@rdf-vocab/gems/rdf-1.1.9/lib/rdf/util/file.rb:170:in `open_file'
/Users/dc/.rvm/gems/ruby-2.1.2@rdf-vocab/gems/rdf-1.1.9/lib/rdf/reader.rb:136:in `open'
/Users/dc/.rvm/gems/ruby-2.1.2@rdf-vocab/gems/rdf-1.1.9/lib/rdf/mixin/mutable.rb:43:in `load'
/Users/dc/.rvm/gems/ruby-2.1.2@rdf-vocab/gems/rdf-1.1.9/lib/rdf/model/graph.rb:75:in `block in load'
/Users/dc/.rvm/gems/ruby-2.1.2@rdf-vocab/gems/rdf-1.1.9/lib/rdf/model/graph.rb:125:in `call'
/Users/dc/.rvm/gems/ruby-2.1.2@rdf-vocab/gems/rdf-1.1.9/lib/rdf/model/graph.rb:125:in `initialize'
/Users/dc/.rvm/gems/ruby-2.1.2@rdf-vocab/gems/rdf-1.1.9/lib/rdf/model/graph.rb:74:in `new'
/Users/dc/.rvm/gems/ruby-2.1.2@rdf-vocab/gems/rdf-1.1.9/lib/rdf/model/graph.rb:74:in `load'
/Users/dc/.rvm/gems/ruby-2.1.2@rdf-vocab/gems/rdf-1.1.9/lib/rdf/vocab.rb:319:in `load'
/Users/dc/.rvm/gems/ruby-2.1.2@rdf-vocab/gems/rdf-1.1.9/lib/rdf/cli/vocab-loader.rb:141:in `run'
/Users/dc/github/projecthydra-labs/rdf-vocab/lib/rdf-vocab.rb:18:in `generate'
/Users/dc/github/projecthydra-labs/rdf-vocab/lib/rdf-vocab/tasks/vocab.rake:14:in `block (3 levels) in <top (required)>'
/Users/dc/.rvm/gems/ruby-2.1.2@rdf-vocab/bin/ruby_executable_hooks:15:in `eval'
/Users/dc/.rvm/gems/ruby-2.1.2@rdf-vocab/bin/ruby_executable_hooks:15:in `<main>'
jcoyne commented 9 years ago

Looks like a malformed doctype in http://rs.tdwg.org/dwc/rdf/dwcterms.rdf:

<!DOCTYPE rdf:RDF [

<!ENTITY rdfns 'http://www.w3.org/1999/02/22-rdf-syntax-ns#'>
<!ENTITY rdfsns 'http://www.w3.org/2000/01/rdf-schema#'>
<!ENTITY dctermsns 'http://purl.org/dc/terms/'>
<!ENTITY dctypens 'http://purl.org/dc/dcmitype/'>
<!ENTITY dwcattributesns 'http://rs.tdwg.org/dwc/terms/attributes/'>
<!--
<!ENTITY skosns 'http://www.w3.org/2004/02/skos/core#'>
<!ENTITY vsns 'http://www.w3.org/2003/06/sw-vocab-status/ns#'>
-->
]>
dchandekstark commented 9 years ago

@jcoyne That's weird - W3C validator parsed it fine.

jcoyne commented 9 years ago

okay, firefox is telling me Bogus doctype. Stray doctype

gkellogg commented 9 years ago

I don't think DOCTYPE is the issue; it may be a change in Nokogiri. It calls ::Nokogiri::XML.parse with the document IO, base URI and 'utf-8', and results in returning an empty document. If run with a string, by reading the document, it works fine, or if 'utf-8' isn't passed.

File a big on http://github.com/ruby-rdf/rdf-rdfxml/issues and I'll investigate further.

dchandekstark commented 9 years ago

Thanks, @gkellogg!

jhallida commented 9 years ago

Thanks all for the quick responses! I will wait for the changes to trickle down to production code and then try again.

acoburn commented 9 years ago

@jhallida there is a workaround for this. The issue appears to be due to the byte-order-mark in the file. If you download the darwin core file locally and remove the BOM prefix, you can run the following command:

vocab-fetch --module-name RDF::Vocab --class-name DWC --uri http://rs.tdwg.org/dwc/terms/ --source dwcterms.rdf > lib/rdf-vocab/vocab/dwc.rb

@dchandekstark what would you think of repeating this process locally in order to get the DWC vocabulary into this module? It would mean that

rake vocab:dwc

would not succeed, but the vocabulary would be present.

dchandekstark commented 9 years ago

Hm, can we push this back to @gkellogg and ruby-rdf/rdf-rdfxml#35 ?

gkellogg commented 9 years ago

If there's a problem with the file such that it can't be deserialized without modification, there's nothing to be done in RDF.rb. In such cases, I usipually make a local modified copy and use that as the source. Unless you can correct the source, or determine that there's a butpg in the reader, I can't think of anything else.

barmintor commented 9 years ago

If it really is tripping on the BOM in an otherwise correctly-encoded UTF8 stream, there's a trick in Ruby 1.9: File.open(path,"r:bom|utf-8")

Not sure if it persisted into Ruby 2.

On Fri, Mar 6, 2015 at 1:09 AM, Gregg Kellogg notifications@github.com wrote:

If there's a problem with the file such that it can't be deserialized without modification, there's nothing to be done in RDF.rb. In such cases, I usipually make a local modified copy and use that as the source. Unless you can correct the source, or determine that there's a butpg in the reader, I can't think of anything else.

— Reply to this email directly or view it on GitHub https://github.com/projecthydra-labs/rdf-vocab/issues/35#issuecomment-77511889 .

dchandekstark commented 9 years ago

Thanks, @gkellogg and @barmintor. I'm not familiar with the phenomenon, so just kicked it around.

So, I think @jhallida has a workaround. Before adding the vocab to rdf-vocab, perhaps we should contact the DWC folks about this issue. Maybe they'll issue a fix. If that doesn't happen, I suppose we could create a place to put source files for cases like this one (the config entry could point to the path relative to that directory). @acoburn ?