readium / readium-test-files

Some ePub3 files used to demonstrate the capabilities of the Readium SDK and derived reading systems.
6 stars 5 forks source link

Need to resolve issues with HTTP(S) settings #8

Open rkwright opened 6 years ago

rkwright commented 6 years ago

The files posted at http://readium.org/readium-test-files/ do not have correct settings for HTTPS. This is mainly due to settings for readium.org.

The following notes from Daniel Weck.

rkwright commented 6 years ago

from @danielweck

========================== 4) Out of curiosity (to ensure that Readium's cloud/web reader optimally fetches data from the given links), I checked the HTTP CORS headers in the EPUB URLs (Access-Control-Allow-Origin, etc.), as well as HTTP 1.1 "Accept-Ranges: bytes", the "Content-Type" header, and secure HTTPS support. Here is an example with epub30-test-0201.epub: curl -I -X GET {LINK_URL}

When {LINK_URL} = http[s]://readium.github.io/readium-test-files/functional/Revised-TS-FXL/epub30-test-0201.epub (note that the result is the same with HTTPS and HTTP) => I imagine there is a DNS CNAME domain redirection for readium.org ==> readium.github.io, because the response is HTTP 301 "permanently moved" to:

{LINK_URL} = http://readium.org/readium-test-files/functional/Revised-TS-FXL/epub30-test-0201.epub (note that this is not secure HTTPS!) => the HTTP response correctly supplies all useful headers.

{LINK_URL} = https://raw.githubusercontent.com/readium/readium-test-files/master/functional/Revised-TS-FXL/epub30-test-0201.epub (this is GitHub's default "download" URL from their web interface) => interestingly, correctly supplies HTTP CORS and range headers, but 'Content-Type' is "application/octet-stream" instead of "application/epub+zip".

{LINK_URL} = https://rawgit.com/readium/readium-test-files/master/functional/Revised-TS-FXL/epub30-test-0201.epub => interestingly, RawGit responds with HTTP 301 "permanently moved" to raw.githubusercontent.com (see above).

So, it would seem that the best URL format is: http://readium.org/readium-test-files/functional/Revised-TS-FXL/epub30-test-02{XX}.epub ...but, read below :(

========================== 5) there is a secure HTTP problem with https://readium.org: Error code: SSL_ERROR_BAD_CERT_DOMAIN readium.org uses an invalid security certificate. The certificate is only valid for the following names: www.github.com, *.github.io, *.githubusercontent.com, *.github.com, github.com, github.io, githubusercontent.com

This is problematic because the Readium cloud/web reader app (just as any other website) cannot mix secure HTTPS and insecure HTTP, so we cannot use the optimum http[s]://readium.org URL mentioned above. Instead, we have to fallback to https://raw.githubusercontent.com (which serves "application/octet-stream" instead of "application/epub+zip" HTTP Content-Type header). Both Content-Types are supported by the Readium web app so this is not a deal-breaker, but it still sucks that we cannot directly use the http[s]://readium.org links (or even https://readium.github.io because of the HTTP 301 redirect to insecure readium.org).

Example of a working Readium web/cloud reader link: https://readium.firebaseapp.com/?epub=https%3A%2F%2Fraw.githubusercontent.com%2Freadium%2Freadium-test-files%2Fmaster%2Ffunctional%2FRevised-TS-FXL%2Fepub30-test-0201.epub

...also works with RawGit as this service responds with a HTTP 301 redirect to the above regular GitHub URL: https://readium.firebaseapp.com/?epub=https%3A%2F%2Frawgit.com%2Freadium%2Freadium-test-files%2Fmaster%2Ffunctional%2FRevised-TS-FXL%2Fepub30-test-0201.epub

========================== 6) Unfortunately none of the URLs listed above respond with the HTTP CORS header allowing Content-Length to be queried remotely (from another origin). The net result is that the Readium cloud/web reader is not capable of using HTTP 1.1 Accept-Ranges, so the app falls back to downloading the entire EPUB in memory instead of fetching byte ranges as needed.

Note that we currently have the exact same problem with packed/zipped EPUB files hosted at Firebase and Surge, so I will check our current configuration [8] to see if we can apply similar overrides as we do with the Readium2 NodeJS streamer [9]. curl -I -X GET https://readium.firebaseapp.com/epub_content/internal_link.epub => Access-Control-Allow-Origin = "*" ...but missing: Access-Control-Allow-Methods = "GET, HEAD, OPTIONS" (intentionally excludes POST, DELETE, PUT, PATCH) Access-Control-Allow-Headers = "Content-Type, Content-Length, Accept-Ranges, Link, Transfer-Encoding"

[8] https://github.com/readium/readium-js-viewer/blob/develop/firebase.json

[9] https://github.com/edrlab/r2-streamer-js/blob/52270ae154cdc4d8d3460d5bd62fa3c7235113b5/src/http/server.ts#L484-L493

rkwright commented 6 years ago

from @danielweck

Quick follow-up about HTTP CORS:

With a bit of help from https://developer.mozilla.org/en-US/docs/Web/HTTP/CORS I fixed the Firebase headers configuration: https://github.com/readium/readium-js-viewer/blob/develop/firebase.json

CLI test:

curl -I -X GET https://readium.firebaseapp.com/epub_content/internal_link.epub

Readium cloud/web reader test:

https://readium.surge.sh/?epub=https%3A%2F%2Freadium.firebaseapp.com%2Fepub_content%2Finternal_link.epub

rkwright commented 6 years ago

from @danielweck

Regarding the SSL_ERROR_BAD_CERT_DOMAIN error, the CNAME looks correct ( https://github.com/readium/readium.github.io/blob/master/CNAME ) but perhaps the HTTPS configuration is incorrect, see: https://help.github.com/articles/securing-your-github-pages-site-with-https/ and: https://help.github.com/articles/quick-start-setting-up-a-custom-domain/ More info: https://help.github.com/articles/adding-or-removing-a-custom-domain-for-your-github-pages-site/ And: https://help.github.com/articles/troubleshooting-custom-domains/#https-errors

rkwright commented 6 years ago

from @danielweck

by the way: once HTTPS works with enforcement / auto-redirect (recommended practice nowadays), it might be worth considering setting a canonical URL for the Jekyll website:

https://github.com/readium/readium.github.io/blob/master/_config.yml#L22 (e.g. https://readium.org)

https://github.com/readium/readium.github.io/blob/master/_includes/head.html#L6 (usually <link rel="canonical" href="{{ site.url }}{{ page.url }}" /> but this depends on your config ... right now the generated string is not an absolute URL, so there is a problem somewhere)

rkwright commented 6 years ago

from @danielweck

I realize I am digressing a bit in this email thread, but in fairness this kind of erroneous HTTPS configuration does in fact impact Readium.org 's ability to host sites, serve files, etc. (especially when handshaking across domains / origins, such as HTTP CORS with the cloud reader).

... anyway, I will just mention these last few debunking things (you may copy/paste for future reference, and/or pass onto web-admin @ Readium Foundation):

HTTPS checks (readium.github.io): https://mxtoolbox.com/SuperTool.aspx?action=https:readium.github.io&run=toolpage#

HTTPS checks (readium.org, same as above but mismatch name): https://mxtoolbox.com/SuperTool.aspx?action=https:readium.org&run=toolpage#

WHOIS DNS lookup: https://mxtoolbox.com/SuperTool.aspx?action=whois:readium.org&run=toolpage#

A DNS lookup: https://mxtoolbox.com/SuperTool.aspx?action=a:readium.org&run=toolpage#

CNAME DNS lookup: https://mxtoolbox.com/SuperTool.aspx?action=cname:readium.org&run=toolpage#

>>> dig www.readium.org +nostats +nocmd +nocomments

; <<>> DiG 9.8.3-P1 <<>> www.readium.org +nostats +nocmd +nocomments
;; global options: +cmd
;www.readium.org.        IN    A
www.readium.org.    359    IN    A    185.199.108.153

>>> dig readium.org +nostats +nocmd +nocomments

; <<>> DiG 9.8.3-P1 <<>> readium.org +nostats +nocmd +nocomments
;; global options: +cmd
;readium.org.            IN    A
readium.org.        478    IN    A    185.199.109.153
readium.org.        478    IN    A    185.199.110.153
readium.org.        478    IN    A    185.199.111.153
readium.org.        478    IN    A    185.199.108.153

>>> curl -I -X GET http://www.readium.org

HTTP/1.1 301 Moved Permanently
Location: http://readium.org/

>>> curl -I -X GET https://www.readium.org --insecure

HTTP/1.1 301 Moved Permanently
Location: https://readium.org/

>>> curl -I -X GET http://readium.org

HTTP/1.1 200 OK
Server: GitHub.com

>>> curl -I -X GET https://readium.org --insecure

HTTP/1.1 200 OK
Server: GitHub.com

>>> curl -I -X GET http://readium.github.io

HTTP/1.1 301 Moved Permanently
Location: http://readium.org/

>>> curl -I -X GET https://readium.github.io

HTTP/1.1 301 Moved Permanently
Location: http://readium.org/

(note the non-secure HTTP redirect with this last one ... I suspect "HTTP enforcement" has not been turned on in GitHub?)

danielweck commented 6 years ago

Above excerpts from this email discussion thread (readium-dev Google Group): https://groups.google.com/forum/#!topic/readium-dev/rAzbKu2Jmtk

danielweck commented 6 years ago

Useful: https://help.github.com/articles/troubleshooting-custom-domains/#https-errors

rkwright commented 6 years ago

@danielweck Given that readium.org now has HTTPS enabled by default, is this issue now moot - other than the fact it is a great piece of documentation on CORS and Readium?

danielweck commented 6 years ago

The issue can be closed.