owlcollab / owltools

OWLTools
BSD 3-Clause "New" or "Revised" License
108 stars 33 forks source link

Hard-coded URLs (e.g. GO.xrf_abbs) affected by root domain change #276

Closed kltm closed 5 years ago

kltm commented 5 years ago

In some places (e.g. ./OWLTools-Annotation/src/main/java/owltools/gaf/rules/go/GoAnnotationRulesFactoryImpl.java), owltools seems to use some old-style URLs, like for http://www.geneontology.org/doc/GO.xrf_abbs . These should either use some kind of PURL or be written in a more flexible way.

kltm commented 5 years ago

Keeping @cmungall in the know, but see geneontology/operations ticket.

kltm commented 5 years ago

Also http://www.geneontology.org/quality_control/annotation_checks/annotation_qc.xml

kltm commented 5 years ago

While the document is now homed at: https://raw.githubusercontent.com/owlcollab/owltools/master/docs/legacy/annotation_qc.xml , we are still getting an error like:

2019-03-17 15:15:36,471 ERROR (AnnotationRulesFactoryImpl:84) Unable to load document form: http://www.geneontology.org/quality_control/annotation_checks/annotation_qc.xml
org.jdom.input.JDOMParseException: Error on line 1 of document http://geneontology.org/quality_control/annotation_checks/annotation_qc.xml: White spaces are required between publicId and systemId.
    at org.jdom.input.SAXBuilder.build(SAXBuilder.java:530)
    at org.jdom.input.SAXBuilder.build(SAXBuilder.java:905)
    at owltools.gaf.rules.AnnotationRulesFactoryImpl.init(AnnotationRulesFactoryImpl.java:82)
...
    at owltools.cli.CommandRunnerBase.run(CommandRunnerBase.java:68)
    at owltools.cli.CommandLineInterface.main(CommandLineInterface.java:12)
Caused by: org.xml.sax.SAXParseException; systemId: http://geneontology.org/quality_control/annotation_checks/annotation_qc.xml; lineNumber: 1; columnNumber: 50; White spaces are required between publicId and systemId.
    at org.apache.xerces.util.ErrorHandlerWrapper.createSAXParseException(Unknown Source)
    at org.apache.xerces.util.ErrorHandlerWrapper.fatalError(Unknown Source)
    at org.apache.xerces.impl.XMLErrorReporter.reportError(Unknown Source)
...
Caused by: org.xml.sax.SAXParseException; systemId: http://geneontology.org/quality_control/annotation_checks/annotation_qc.xml; lineNumber: 1; columnNumber: 50; White spaces are required between publicId and systemId.
    at org.apache.xerces.util.ErrorHandlerWrapper.createSAXParseException(Unknown Source)
    at org.apache.xerces.util.ErrorHandlerWrapper.fatalError(Unknown Source)

The new home would seem to be identical to the previous versions of the file, we are continuing to get some kind of error.

This error can be reproduced with:

~/local/src/git/owltools/OWLTools-Runner/bin/owltools --log-warning /tmp/go-basic.owl --gaf /tmp/foo.gaf --createReport --gaf-report-file -owltools-check.txt --gaf-report-summary-file -summary.txt --gaf-prediction-file -prediction.gaf --gaf-prediction-report-file -prediction-report.txt --gaf-validation-unsatisfiable-module -incoherent.owl --experimental-gaf-prediction-file -prediction-experimental.gaf --experimental-gaf-prediction-report-file -prediction-experimental-report.txt --gaf-run-checks || echo 'errors found'

Odd, I wonder if the headers are the same between the former and current versions. Google doesn't lead much to an explanation as to what may have changed. May have to start making random changes to see if something works.

Tagging @cmungall

kltm commented 5 years ago

http://52.27.86.54/quality_control/annotation_checks/annotation_qc.xml (old) diffs to same as http://www.geneontology.org/quality_control/annotation_checks/annotation_qc.xml (new). Browser response is quite different. Old:

HTTP/1.1 200 OK
Accept-Ranges: bytes
Connection: Keep-Alive
Content-Encoding: gzip
Content-Type: text/xml
Date: Mon, 18 Mar 2019 17:51:20 GMT
ETag: "1c6437-9016-4fcaeb1916be4-gzip"
Keep-Alive: timeout=5, max=100
Last-Modified: Wed, 25 Jun 2014 20:12:20 GMT
Server: Apache
Transfer-Encoding: chunked
Vary: Accept-Encoding

New:

HTTP/1.1 301 Moved Permanently
Connection: Keep-Alive
Content-Length: 369
Content-Type: text/html; charset=iso-8859-1
Date: Mon, 18 Mar 2019 17:52:10 GMT
Keep-Alive: timeout=5, max=100
Location: http://geneontology.org/quality_control/annotation_checks/annotation_qc.xml
Server: Apache/2.4.29 (Ubuntu)
...
HTTP/1.1 302 Found
Connection: Keep-Alive
Content-Length: 273
Content-Type: text/html; charset=iso-8859-1
Date: Mon, 18 Mar 2019 17:53:17 GMT
Keep-Alive: timeout=5, max=100
Location: https://raw.githubusercontent.com/owlcollab/owltools/master/docs/legacy/annotation_qc.xml
Server: Apache/2.4.29 (Ubuntu)
...
HTTP/1.1 200 OK
Accept-Ranges: bytes
Access-Control-Allow-Origin: *
Cache-Control: max-age=300
Connection: keep-alive
Content-Encoding: gzip
Content-Length: 7478
Content-Security-Policy: default-src 'none'; style-src 'unsafe-inline'; sandbox
Content-Type: text/plain; charset=utf-8
Date: Mon, 18 Mar 2019 17:53:40 GMT
ETag: "96fb5814b9463361f192fcfe9e69f987a03df3c2"
Expires: Mon, 18 Mar 2019 17:58:40 GMT
Source-Age: 198
Strict-Transport-Security: max-age=31536000
Vary: Authorization,Accept-Encoding
Via: 1.1 varnish
X-Cache: HIT
X-Cache-Hits: 1
X-Content-Type-Options: nosniff
X-Fastly-Request-ID: 3a7e0678b5828ce2e8ae6aa56999b3af7a2eeb75
X-Frame-Options: deny
X-Geo-Block-List: 
X-GitHub-Request-Id: 83DE:8CD2:10E9EE7:121DBC5:5C8FDA5E
X-Served-By: cache-pao17424-PAO
X-Timer: S1552931621.916205,VS0,VE0
X-XSS-Protection: 1; mode=block
kltm commented 5 years ago

Trying with S3 bucket https://s3.amazonaws.com/go-public/metadata/annotation_qc.xml :

HTTP/1.1 200 OK
Accept-Ranges: bytes
Content-Length: 36886
Content-Type: text/xml
Date: Mon, 18 Mar 2019 17:57:21 GMT
ETag: "a4a7ba2fbb6ad7da1de2ded9ab8080b7"
Last-Modified: Mon, 18 Mar 2019 17:56:37 GMT
Server: AmazonS3
x-amz-id-2: wwUuu4mevFgGgBcAO0aAcvBTJhyQI7MVCzPyVKMF16Aw7hQChLgTa/vMC+tDRRu1IQgr+f8Khao=
x-amz-request-id: 86C234A41E741807
kltm commented 5 years ago

Still same error found. @cmungall Is it possible that owltools cannot handle redirects?

cmungall commented 5 years ago

Likely the java lib used does not follow redirects by default

On Mon, Mar 18, 2019 at 3:35 PM kltm notifications@github.com wrote:

Still same error found. @cmungall https://github.com/cmungall Is it possible that owltools cannot handle redirects?

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/owlcollab/owltools/issues/276#issuecomment-474127708, or mute the thread https://github.com/notifications/unsubscribe-auth/AADGOQ4lqOH9SgBsE4Crley-lHoJMSGnks5vYBSZgaJpZM4bsMmd .

kltm commented 5 years ago

Choices now would seem to be:

  1. change the owltools code/lib to accept redirects (or remove the need for the external file completely and just use one embedded locally)
  2. to directly embed the file into geneontology.github.io at the right location as a static file

1 is the right solution, 2 may be easier to get out the door immediately.

dougli1sqrd commented 5 years ago

https://github.com/owlcollab/owltools/pull/279

kltm commented 5 years ago

This seems to be fixed with #279

pgaudet commented 5 years ago

Great!