Closed carueda closed 7 years ago
(8/26/2016)
... the lack of a proper ontology with corresponding URI in the file seems to be at least part of the issue.
Summary of some testing I just did:
More details:
<owl:Ontology/>
:<?xml version="1.0"?>
<rdf:RDF xmlns="file:/Users/carueda/Desktop/X-DOMES/Misc_ontologies/sciencekeywords.rdf"
xml:base="file:/Users/carueda/Desktop/X-DOMES/Misc_ontologies/sciencekeywords.rdf"
xmlns:gcmd="http://gcmd.gsfc.nasa.gov/"
xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"
xmlns:owl="http://www.w3.org/2002/07/owl#"
xmlns:xml="http://www.w3.org/XML/1998/namespace"
xmlns:xsd="http://www.w3.org/2001/XMLSchema#"
xmlns:skos="http://www.w3.org/2004/02/skos/core#"
xmlns:rdfs="http://www.w3.org/2000/01/rdf-schema#">
<owl:Ontology/>
</rdf:RDF>
A quick validation with some online tool out there seems to indicate errors with this file, for example:
I tried uploading the file in COR (I know there is already an entry there but wanted to exercise the sequence):
The first step upon the upload of the file itself is to indicate the associated ontology URI, which is required as the identification for the submission:
COR shows two possible URIs, although in this case neither is applicable. I see that http://gcmdservices.gsfc.nasa.gov/static/kms/sciencekeywords/ was entered in Tyler’s submission. I’m doing the same here right now and click next:
COR does not find any associated metadata because there’s actually no
Now, to be able to complete this test submission, I’m changing the URI to http://gcmdservices.gsfc.nasa.gov/static/kms/sciencekeywords2/ (so it will be a brand new entry). I pick the “re-hosted” option and the next page shows:
I click “Complete registration” and COR indicates "Successful registration”. Then go to http://cor.esipfed.org/ont/?uri=http://gcmdservices.gsfc.nasa.gov/static/kms/sciencekeywords2/ and so, I can pretty much just reproduce the behavior observed with the original submission.
(8/26/2016-b)
By looking at the COR logs:
COR tries to automatically recognize the format of the uploaded file. The first format that is attempted, RDF/XML, should be the one that should work for this file.
However the log shows the error:
ERROR org.apache.jena.riot - {E202} Expecting XML start or end element(s).
String data "See http://gcmd.nasa.gov/r/l/TermsOfUse" not allowed. Maybe a striping error.
so COR continues trying other formats ("n3", "nt", "ttl", "rj", "jsonld", …) with OWL/XML eventually “succeeding”. This uses the OWL API library (also used by Protégé). For some reason, this library does not completely load the file (in any case, it does not trigger any error).
Anyway, as already said, the appropriate format should have been RDF/XML. So, looking more closely at the first few lines of the file (inserting some newlines just for legibility):
<?xml version="1.0" encoding="UTF-8"?>
<rdf:RDF xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"
xmlns:skos="http://www.w3.org/2004/02/skos/core#">
<gcmd:termsOfUse xmlns:gcmd="http://gcmd.gsfc.nasa.gov/">
See http://gcmd.nasa.gov/r/l/TermsOfUse</gcmd:termsOfUse>
<gcmd:keywordVersion xmlns:gcmd="http://gcmd.gsfc.nasa.gov/">8.4.1</gcmd:keywordVersion>
<gcmd:schemeVersion xmlns:gcmd="http://gcmd.gsfc.nasa.gov/">2016-08-02 13:08:49</gcmd:schemeVersion>
<skos:Concept rdf:about="1eb0ea0a-312c-4d74-8d42-6f1ad758f999"
xml:base="http://gcmdservices.gsfc.nasa.gov/kms/concept/“
...
we can see that the error reported above (by the Jena library) points to the part starting with See
.
So, I just did a quick test of removing those 3 similar lines and now the full contents of the file (except for those 3 lines) are loaded for the submission, see http://cor.esipfed.org/ont?uri=http://gcmdservices.gsfc.nasa.gov/static/kms/sciencekeywords3/
(9/2/2016)
Hi All,
Thank you for the feedback. I will bring this back to our developers to take a look at the RDF representation of the keywords.
Tyler
Today 2/8/17, I just tried again after noting a pretty recent timestamp for http://gcmdservices.gsfc.nasa.gov/static/kms/sciencekeywords/sciencekeywords.rdf (08-Feb-2017 18:37).
A similar error still can be seen in the ORR logs when trying to recognize this file as in RDF/XML format:
ERROR org.apache.jena.riot - {E202} Expecting XML start or end element(s).
String data "http://gcmd.nasa.gov/r/l/TermsOfUse" not allowed. Maybe a striping error.
The only difference in this particular line is that the See
part has been removed in this new file
<gcmd:termsOfUse xmlns:gcmd="http://gcmd.gsfc.nasa.gov/">
http://gcmd.nasa.gov/r/l/TermsOfUse</gcmd:termsOfUse>
Is there an online validator we (GCMD) can use or is the only way to test available is to try to load the URL into the software?
Greetings from GCMD Development Team!!!!
Hi Thomas,
Thanks for the feedback! Yes, there are online validators. A summary of that previous validation exercise and a new one I just did now is enclosed below. In summary, one validator (based on the OWL API library library) seems to succeed while the other (which seems based on the Jena library) complains with similar errors as I described. In any case, I'll be looking into doing some updates in the ORR software, in particular to upgrade the Jena library (which is the primary one for ontology parsing/processing in the ORR). Then I will repeat the registration exercise and report here.
In my initial testing I used http://mowl-power.cs.man.ac.uk:8080/validator/, with the resulting failure shown in the screenshot attached to the thread above. But I just tried again and the ontology seems to now be passing ok with this particular validator.
What I just did was to enter http://gcmdservices.gsfc.nasa.gov/static/kms/sciencekeywords/sciencekeywords.rdf
in the form and click Validate:
with the result being
Although the error is gone, it is not clear from the validator whether the contents were loaded successfully. Again, the current situation at the COR is that the file is processed by the OWL API library (what this validator uses), but with no contents actually loaded.
I also just tried https://www.w3.org/RDF/Validator/. In this case the output here reflects what the COR is reporting in the backend logs at the time the Apache Jena library is used:
I'm not sure how up-to-date this particular validator is ... Anyway, I'll try to update our own version of the Jena library and repeat the exercise.
Using a separate program, I just repeated the load exercise with the latest versions of both Jena (3.2.0) and OWL-API (5.0.5).
With the following reduced contents of the file (just keeping one of the skos:Concept
s, but with the same header section --with linefeeds added for legibility):
<?xml version="1.0" encoding="UTF-8"?>
<rdf:RDF xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"
xmlns:skos="http://www.w3.org/2004/02/skos/core#"
>
<gcmd:termsOfUse xmlns:gcmd="http://gcmd.gsfc.nasa.gov/">
http://gcmd.nasa.gov/r/l/TermsOfUse
</gcmd:termsOfUse>
<gcmd:keywordVersion xmlns:gcmd="http://gcmd.gsfc.nasa.gov/">
8.4.1
</gcmd:keywordVersion>
<gcmd:schemeVersion xmlns:gcmd="http://gcmd.gsfc.nasa.gov/">
2017-01-26 14:47:47
</gcmd:schemeVersion>
<skos:Concept rdf:about="1eb0ea0a-312c-4d74-8d42-6f1ad758f999"
xml:base="http://gcmdservices.gsfc.nasa.gov/kms/concept/"
xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:skos="http://www.w3.org/2004/02/skos/core#">
<skos:inScheme rdf:resource="http://gcmdservices.gsfc.nasa.gov/kms/concepts/concept_scheme/sciencekeywords"/>
<skos:prefLabel xml:lang="en">Science Keywords</skos:prefLabel>
<skos:narrower rdf:resource="e9f67a66-e9fc-435c-b720-ae32a2c3d8f5"/>
<skos:narrower rdf:resource="894f9116-ae3c-40b6-981d-5113de961710"/>
<skos:changeNote>2015-10-09 14:16:54.0 [gee-cee] Remove Concepts
delete narrower relation (null);
</skos:changeNote>
</skos:Concept>
</rdf:RDF>
The results are basically similar to previous tests:
Jena still complains with
[line: 3, col: 143] {E202} Expecting XML start or end element(s).
String data "http://gcmd.nasa.gov/r/l/TermsOfUse" not allowed.
Maybe a striping error.
OWL-API still loads the file without any error but with no loaded content either:
Ontology(OntologyID(Anonymous-2)) [Axioms: 0 Logical Axioms: 0] First 20 axioms: {}
I have assigned this project to a develop for this sprint he should start working on it this week and I pointed him to this ticket for getting feedback. Official responses will come from Tyler.
Hi Carlos,
I've been looking into your issue and I put a "test" version on gcmdservices that I was wondering if you could try... http://gcmdservices.gsfc.nasa.gov/kms_test/concepts/concept_scheme/sciencekeywords.rdf http://gcmdservices.gsfc.nasa.gov/kms_test/concepts/concept_scheme/sciencekeywords.owl
I think the main issues you are having is stemming from the "terms of use" xml in the RDF, even though it was namespaced, it doesn't look the validators like it.... So we commented it out.
I tried the first link using the W3C RDF validator and was able to see triples back and no errors in validation.
The second link is a OWL representation generated from the service and does validate using that W3C owl validator you mentioned above, although I'm not too familiar with owl but I think you'll have better luck using this format.
Let me know how it goes.. If it works we'll graduate this test webapp to the production version.
Thanks, Chris
Hi Chris,
Thanks. I just tried them:
http://gcmdservices.gsfc.nasa.gov/kms_test/concepts/concept_scheme/sciencekeywords.rdf
No issues at all uploading this one to the COR.
Based on the contents, I just used http://gcmdservices.gsfc.nasa.gov/kms/concept
as the URI for the registration,
http://cor.esipfed.org/ont/?uri=http://gcmdservices.gsfc.nasa.gov/kms/concept
but whatever desired URI can be used for the real submission.
http://gcmdservices.gsfc.nasa.gov/kms_test/concepts/concept_scheme/sciencekeywords.owl
This one also uploads with no errors:
(where I used the URI of the detected ontology, but without the trailing ?format=owl
)
Since in sciencekeywords.owl
all (3,031) concepts are referenced via owl:import
s, each target ontology would also have to be registered if all science keyword semantic information is to be captured in COR. Besides potentially generating a significant overhead on the system, this seems unnecessary.
So, I suggest using the sciencekeywords.rdf
file (or a similar one if you are still doing adjustments) for the registration as it already contains all concepts directly. Only one science keyword ontology entry in the COR (not 3K !).
Besides of course any further questions, please let me know when you think the file is "officially" in production. I can perform the actual registration, but you are welcome to go ahead with it. Before the submission I can create a "GCMD" organization and add you (and anyone else that you can indicate) to that organization. Then any member of the organization can perform the registration and specify "GCMD" as the owner of the registered ontology. In this way we could use your help testing the system!
Thanks again, -c
(Issue reported by T. Stevens on 8/10/2016 -- capturing the email thread here --with some minor editing-- for easier reference.)