mmisw / orr-portal

ORR Frontend component
Apache License 2.0
8 stars 5 forks source link

GCMD Science Keywords #78

Closed carueda closed 7 years ago

carueda commented 7 years ago

(Issue reported by T. Stevens on 8/10/2016 -- capturing the email thread here --with some minor editing-- for easier reference.)

Tom,

I uploaded the GCMD Science Keywords to the Community Ontology Repository (http://cor.esipfed.org/ont/), but there seems to be some issues.

I don’t see the full extent of the keywords under when I click on the keywords. I updated an rdf file I got from our keyword file directory at http://gcmdservices.gsfc.nasa.gov/static/kms/sciencekeywords/.

Can you provide any guidance on what I may be doing wrong?

The RDF version of our keywords can be found at: http://gcmdservices.gsfc.nasa.gov/static/kms/sciencekeywords/sciencekeywords.rdf

Thanks,

Tyler Stevens

carueda commented 7 years ago

(8/26/2016)

... the lack of a proper ontology with corresponding URI in the file seems to be at least part of the issue.

Summary of some testing I just did:

More details:

<?xml version="1.0"?>
<rdf:RDF xmlns="file:/Users/carueda/Desktop/X-DOMES/Misc_ontologies/sciencekeywords.rdf"
     xml:base="file:/Users/carueda/Desktop/X-DOMES/Misc_ontologies/sciencekeywords.rdf"
     xmlns:gcmd="http://gcmd.gsfc.nasa.gov/"
     xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"
     xmlns:owl="http://www.w3.org/2002/07/owl#"
     xmlns:xml="http://www.w3.org/XML/1998/namespace"
     xmlns:xsd="http://www.w3.org/2001/XMLSchema#"
     xmlns:skos="http://www.w3.org/2004/02/skos/core#"
     xmlns:rdfs="http://www.w3.org/2000/01/rdf-schema#">
    <owl:Ontology/>
</rdf:RDF>

The first step upon the upload of the file itself is to indicate the associated ontology URI, which is required as the identification for the submission:

image

COR shows two possible URIs, although in this case neither is applicable. I see that http://gcmdservices.gsfc.nasa.gov/static/kms/sciencekeywords/ was entered in Tyler’s submission. I’m doing the same here right now and click next:

image

COR does not find any associated metadata because there’s actually no in the file having the URI http://gcmdservices.gsfc.nasa.gov/static/kms/sciencekeywords/

Now, to be able to complete this test submission, I’m changing the URI to http://gcmdservices.gsfc.nasa.gov/static/kms/sciencekeywords2/ (so it will be a brand new entry). I pick the “re-hosted” option and the next page shows:

image

I click “Complete registration” and COR indicates "Successful registration”. Then go to http://cor.esipfed.org/ont/?uri=http://gcmdservices.gsfc.nasa.gov/static/kms/sciencekeywords2/ and so, I can pretty much just reproduce the behavior observed with the original submission.

carueda commented 7 years ago

(8/26/2016-b)

By looking at the COR logs:

<?xml version="1.0" encoding="UTF-8"?>
<rdf:RDF xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"
         xmlns:skos="http://www.w3.org/2004/02/skos/core#">

<gcmd:termsOfUse xmlns:gcmd="http://gcmd.gsfc.nasa.gov/">
See http://gcmd.nasa.gov/r/l/TermsOfUse</gcmd:termsOfUse>
<gcmd:keywordVersion xmlns:gcmd="http://gcmd.gsfc.nasa.gov/">8.4.1</gcmd:keywordVersion>
<gcmd:schemeVersion xmlns:gcmd="http://gcmd.gsfc.nasa.gov/">2016-08-02 13:08:49</gcmd:schemeVersion>

<skos:Concept rdf:about="1eb0ea0a-312c-4d74-8d42-6f1ad758f999"
   xml:base="http://gcmdservices.gsfc.nasa.gov/kms/concept/“
...

we can see that the error reported above (by the Jena library) points to the part starting with See.

So, I just did a quick test of removing those 3 similar lines and now the full contents of the file (except for those 3 lines) are loaded for the submission, see http://cor.esipfed.org/ont?uri=http://gcmdservices.gsfc.nasa.gov/static/kms/sciencekeywords3/

carueda commented 7 years ago

(9/2/2016)

Hi All,

Thank you for the feedback. I will bring this back to our developers to take a look at the RDF representation of the keywords.

Tyler

carueda commented 7 years ago

Today 2/8/17, I just tried again after noting a pretty recent timestamp for http://gcmdservices.gsfc.nasa.gov/static/kms/sciencekeywords/sciencekeywords.rdf (08-Feb-2017 18:37).

A similar error still can be seen in the ORR logs when trying to recognize this file as in RDF/XML format:

ERROR org.apache.jena.riot - {E202} Expecting XML start or end element(s). 
String data "http://gcmd.nasa.gov/r/l/TermsOfUse" not allowed. Maybe a striping error.

The only difference in this particular line is that the See part has been removed in this new file

<gcmd:termsOfUse xmlns:gcmd="http://gcmd.gsfc.nasa.gov/">
http://gcmd.nasa.gov/r/l/TermsOfUse</gcmd:termsOfUse>
jceaser commented 7 years ago

Is there an online validator we (GCMD) can use or is the only way to test available is to try to load the URL into the software?


Greetings from GCMD Development Team!!!!

carueda commented 7 years ago

Hi Thomas,

Thanks for the feedback! Yes, there are online validators. A summary of that previous validation exercise and a new one I just did now is enclosed below. In summary, one validator (based on the OWL API library library) seems to succeed while the other (which seems based on the Jena library) complains with similar errors as I described. In any case, I'll be looking into doing some updates in the ORR software, in particular to upgrade the Jena library (which is the primary one for ontology parsing/processing in the ORR). Then I will repeat the registration exercise and report here.


  1. In my initial testing I used http://mowl-power.cs.man.ac.uk:8080/validator/, with the resulting failure shown in the screenshot attached to the thread above. But I just tried again and the ontology seems to now be passing ok with this particular validator.

    What I just did was to enter http://gcmdservices.gsfc.nasa.gov/static/kms/sciencekeywords/sciencekeywords.rdf in the form and click Validate: 2017-02-10_0819

    with the result being

    2017-02-10_0820

    Although the error is gone, it is not clear from the validator whether the contents were loaded successfully. Again, the current situation at the COR is that the file is processed by the OWL API library (what this validator uses), but with no contents actually loaded.

  2. I also just tried https://www.w3.org/RDF/Validator/. In this case the output here reflects what the COR is reporting in the backend logs at the time the Apache Jena library is used: 2017-02-10_0825

    I'm not sure how up-to-date this particular validator is ... Anyway, I'll try to update our own version of the Jena library and repeat the exercise.

carueda commented 7 years ago

Using a separate program, I just repeated the load exercise with the latest versions of both Jena (3.2.0) and OWL-API (5.0.5).

With the following reduced contents of the file (just keeping one of the skos:Concepts, but with the same header section --with linefeeds added for legibility):

<?xml version="1.0" encoding="UTF-8"?>
<rdf:RDF xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"
         xmlns:skos="http://www.w3.org/2004/02/skos/core#"
>
<gcmd:termsOfUse xmlns:gcmd="http://gcmd.gsfc.nasa.gov/">
  http://gcmd.nasa.gov/r/l/TermsOfUse
</gcmd:termsOfUse>
<gcmd:keywordVersion xmlns:gcmd="http://gcmd.gsfc.nasa.gov/">
  8.4.1
</gcmd:keywordVersion>
<gcmd:schemeVersion xmlns:gcmd="http://gcmd.gsfc.nasa.gov/">
  2017-01-26 14:47:47
</gcmd:schemeVersion>

<skos:Concept rdf:about="1eb0ea0a-312c-4d74-8d42-6f1ad758f999"
   xml:base="http://gcmdservices.gsfc.nasa.gov/kms/concept/"
   xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:skos="http://www.w3.org/2004/02/skos/core#">
   <skos:inScheme rdf:resource="http://gcmdservices.gsfc.nasa.gov/kms/concepts/concept_scheme/sciencekeywords"/>
   <skos:prefLabel xml:lang="en">Science Keywords</skos:prefLabel>
   <skos:narrower rdf:resource="e9f67a66-e9fc-435c-b720-ae32a2c3d8f5"/>
   <skos:narrower rdf:resource="894f9116-ae3c-40b6-981d-5113de961710"/>
   <skos:changeNote>2015-10-09 14:16:54.0 [gee-cee] Remove Concepts 
delete narrower relation (null);
   </skos:changeNote>
</skos:Concept>
</rdf:RDF>

The results are basically similar to previous tests:

jceaser commented 7 years ago

I have assigned this project to a develop for this sprint he should start working on it this week and I pointed him to this ticket for getting feedback. Official responses will come from Tyler.

chrisgokey commented 7 years ago

Hi Carlos,

I've been looking into your issue and I put a "test" version on gcmdservices that I was wondering if you could try... http://gcmdservices.gsfc.nasa.gov/kms_test/concepts/concept_scheme/sciencekeywords.rdf http://gcmdservices.gsfc.nasa.gov/kms_test/concepts/concept_scheme/sciencekeywords.owl

I think the main issues you are having is stemming from the "terms of use" xml in the RDF, even though it was namespaced, it doesn't look the validators like it.... So we commented it out.

I tried the first link using the W3C RDF validator and was able to see triples back and no errors in validation.

The second link is a OWL representation generated from the service and does validate using that W3C owl validator you mentioned above, although I'm not too familiar with owl but I think you'll have better luck using this format.

Let me know how it goes.. If it works we'll graduate this test webapp to the production version.

Thanks, Chris

carueda commented 7 years ago

Hi Chris,

Thanks. I just tried them:

http://gcmdservices.gsfc.nasa.gov/kms_test/concepts/concept_scheme/sciencekeywords.rdf

No issues at all uploading this one to the COR.

Based on the contents, I just used http://gcmdservices.gsfc.nasa.gov/kms/concept as the URI for the registration,

http://cor.esipfed.org/ont/?uri=http://gcmdservices.gsfc.nasa.gov/kms/concept

but whatever desired URI can be used for the real submission.

http://gcmdservices.gsfc.nasa.gov/kms_test/concepts/concept_scheme/sciencekeywords.owl

This one also uploads with no errors:

http://cor.esipfed.org/ont/?uri=http://gcmdservices.gsfc.nasa.gov/kms/concepts/concept_scheme/sciencekeywords

(where I used the URI of the detected ontology, but without the trailing ?format=owl)

Since in sciencekeywords.owl all (3,031) concepts are referenced via owl:imports, each target ontology would also have to be registered if all science keyword semantic information is to be captured in COR. Besides potentially generating a significant overhead on the system, this seems unnecessary.

So, I suggest using the sciencekeywords.rdf file (or a similar one if you are still doing adjustments) for the registration as it already contains all concepts directly. Only one science keyword ontology entry in the COR (not 3K !).

Besides of course any further questions, please let me know when you think the file is "officially" in production. I can perform the actual registration, but you are welcome to go ahead with it. Before the submission I can create a "GCMD" organization and add you (and anyone else that you can indicate) to that organization. Then any member of the organization can perform the registration and specify "GCMD" as the owner of the registered ontology. In this way we could use your help testing the system!

Thanks again, -c