Validation? - Githubissues

mjgiarlo commented 10 years ago

@dchandekstark should clarify this question.

dchandekstark commented 10 years ago

@mjgiarlo Thinking about whether and how contributed vocabs are "validated" for accuracy, completeness, etc.

acoburn commented 10 years ago

@dchandekstark Any chance that a contributed vocab could be validated against an OWL representation? Also, should vocab properties contain type:, subClassOf:, domain:, range:, etc, attributes where available?

mjgiarlo commented 10 years ago

@acoburn sounds like you two should be on the Hydra RDF Working Group! ;) Keep an eye out for details about our next call, post-OR2014.

dchandekstark commented 10 years ago

@acoburn I'm sure others know more about validation issues specific to this domain than I. As a user of the library I would just like to have some assurance that the code is a valid representation of the vocabulary. Ideally these validations would be built into a test suite in some fashion.

no-reply commented 10 years ago

The better approach, i think, is the one employed by rdf.rb. Rather than validating, simply generate the vocabs from OWL. It would be a good idea to get more property/class attributes into the vocabularies. I think the Vocabulary class should be extensible, but I would have to check.

On Fri, Jun 6, 2014 at 2:31 PM, Michael J. Giarlo notifications@github.com wrote:

@acoburn https://github.com/acoburn sounds like you two should be on the Hydra RDF Working Group! ;) Keep an eye out for details about our next call, post-OR2014.

— Reply to this email directly or view it on GitHub https://github.com/projecthydra-labs/rdf-vocab/issues/7#issuecomment-45386426 .

acoburn commented 10 years ago

I agree, but using vocab-fetch from rdf.rb only ever gives me empty class definitions with MADS, MODS and the other LoC ontologies that I have tried.

dchandekstark commented 10 years ago

@no-reply I echo @acoburn on that. Maybe you could say more?

acoburn commented 10 years ago

I take that back about MADS and MODS. Using the vocab-fetch script from rdf.rb, I was able to generate correct vocab classes for both MADS and MODS (once I got the namespace values sorted out).

$ ruby vocab-fetch --uri http://www.loc.gov/mads/rdf/v1# --source http://www.loc.gov/standards/mads/rdf/v1.rdf --class-name MADS

and

$ ruby vocab-fetch --uri http://www.loc.gov/mods/rdf/v1# --source http://www.loc.gov/mods/modsrdf/v1/modsrdf.owl --class-name MODS

A related design question would be: would rdf-vocab be a collection of actual ruby RDF::Vocabulary classes (as it is now) or, instead, a collection of OWL ontologies that generate vocabularies when the user installs the package?

Or, to put this another way, if it is so easy to generate vocabularies from source OWL files, couldn't this process become part of an AF/Hydra generator? Or simply part of some good documentation?

dchandekstark commented 10 years ago

@acoburn I'm not sure that every vocab we might want is currently available as an OWL ontology, but I suppose for those that are, the vocab classes wouldn't have to be pre-generated.

dchandekstark commented 10 years ago

The more I think about it, the more I like the idea of not pre-generating vocabs for which there are OWL docs. Instead we could have a rake task or tasks and config file of URIs and sources, etc. If y'all are on board with this general direction, maybe we call the validation issue resolved for the time being and move on to fleshing out the functional details of vocab generation?

jpstroop commented 10 years ago

+.5. I think it might make sense to cache the ontologies/RDFS as fixtures and have the rake task build the classes from those--this way we're not dependent on every site that hosts a a vocab being up all the time.

dchandekstark commented 10 years ago

Not sure if this addresses your concern, Jon, but I was imagining that the generated vocabs would be stored somewhere, either in the installed gem or the target app. So once the initial generation happens, there would be no dependency on remote host uptime. OTOH I'm certainly not invested in this approach, more interested in a general consensus at this point.

dchandekstark commented 10 years ago

Jon, if we did that caching, how about also checking the remote sites, if available, for comparison?

jpstroop commented 10 years ago

I tend to think we should cache the source vocab when we can (when could we not? Situations where it's too big to be practical?), but it probably does make sense to have the class generated when the gem is installed...this way the output of the class creation can be in line with the version of rdf.rb that the application is using (right? So long as the API for fetching vocabs doesn't change, I guess).

-Js

Sent via mobile. Please excuse typos, brevity, etc.

awead commented 10 years ago

You could store the OWL file directly as you would an asset. If generating vocabs is trivial enough, why not do it "on the fly" ? Querying the vocab would load the OWL file each time and creating new vocabularies would be a simple matter of loading a new OWL doc.

I'm more in favor of generating the Ruby code. As OWLs change, you can re-generate the class, commit the Ruby code, and release a new version of the gem. If you're concerned about staying in step with the current OWL, you could keep the file's checksum in the class and use it to check it against the online version. A superclass with checksum and version methods would help keep track of that.

I do see the advantage of caching the OWLs, but why not take the additional step of storing them such that they're available to the user.

dchandekstark commented 10 years ago

There are a number of good ideas in this thread. We have wandered away from the validation question strictly speaking, but that's OK. I think we need a live conversion to pull it all together into a plan. Not sure if the next Hydra RDF WG call is appropriate. If not, maybe a Google hangout or something.

mjgiarlo commented 10 years ago

IMO, the next Hydra RDG WG call is appropriate. Pinging @no-reply .

dchandekstark commented 9 years ago

I think we've basically solved the "validation" question by relying on ruby-rdf's vocab-loader and authoritative source documents. I'm going to close this issue and open another one about storing source files.

samvera-deprecated / rdf-vocab

Validation? #7