readium / r2-streamer-js

NodeJS Readium2 "streamer"
BSD 3-Clause "New" or "Revised" License
21 stars 10 forks source link

Unable to use cloudfront signed url in Load HTTP publication URL! #36

Closed nguyenhuutuananh closed 5 years ago

nguyenhuutuananh commented 6 years ago

Hi @danielweck ,

Today i need to process epub store in s3 with cloudfront signed url. But with the same epub I stored in streamer server, i upload it to s3 then the streamer can't parse spine.

Epub uuid: urn:uuid:6f796b4e-0cd5-4ed5-a319-2fe6a9d14e9f

As I said before, the above epub still can parsed if epub url doesn't use cloudfront signed url!

Stored in server: image

Stored in s3 with signed url: image

Tuan Anh, Thanks

danielweck commented 6 years ago

In your screenshot, the !!!!!! NO MEDIA TYPE?! error indicates that you are loading a CBZ file, not an EPUB. Is that correct? If so, can you please verify the file extensions inside your CBZ? (should be image.jpg or image.jpeg or image.png etc.) Or maybe they are upper case image.JPG?

See the code that checks for file extensions in order to determine the HTTP Content Type (aka "mime" type):

https://github.com/readium/r2-shared-js/blob/70f65cb5834fa9f2d3497f532997269dad6a2951/src/parser/cbz.ts#L64 ( https://github.com/readium/r2-shared-js/blob/develop/src/parser/cbz.ts#L64 )

danielweck commented 6 years ago

Furthermore, I do not really understand "s3 with cloudfront signed url". Do you deploy the entire r2-streamer-js instance to this service? (including the misc/epubs/ folder which contains the publications you are trying to load into the streamer, when the server starts) Or do you deploy the streamer and the publications to different hosting services?

danielweck commented 6 years ago

PS: what version / Git revision of r2-streamer-js are you testing with?

Official NPM distribution: https://www.npmjs.com/package/r2-streamer-js/v/1.0.0-alpha.5

Official GitHub release tag: https://github.com/readium/r2-streamer-js/tree/v1.0.0-alpha.5

Changelog: https://github.com/readium/r2-streamer-js/blob/develop/CHANGELOG.md

Changelog of r2-xxx-js package dependencies: https://github.com/readium/r2-shared-js/blob/develop/CHANGELOG.md https://github.com/readium/r2-utils-js/blob/develop/CHANGELOG.md https://github.com/readium/r2-opds-js/blob/develop/CHANGELOG.md https://github.com/readium/r2-lcp-js/blob/develop/CHANGELOG.md

nguyenhuutuananh commented 6 years ago

In your screenshot, the !!!!!! NO MEDIA TYPE?! error indicates that you are loading a CBZ file, not an EPUB. Is that correct? If so, can you please verify the file extensions inside your CBZ? (should be image.jpg or image.jpeg or image.png etc.) Or maybe they are upper case image.JPG?

See the code that checks for file extensions in order to determine the HTTP Content Type (aka "mime" type):

https://github.com/readium/r2-shared-js/blob/70f65cb5834fa9f2d3497f532997269dad6a2951/src/parser/cbz.ts#L64 ( https://github.com/readium/r2-shared-js/blob/develop/src/parser/cbz.ts#L64 )

I uploaded epub file, it's not cbz file but streamer still parse it as cbz. I will find out why! Maybe my problem is my epub! Thank you very much!

nguyenhuutuananh commented 6 years ago

Furthermore, I do not really understand "s3 with cloudfront signed url". Do you deploy the entire r2-streamer-js instance to this service? (including the misc/epubs/ folder which contains the publications you are trying to load into the streamer, when the server starts) Or do you deploy the streamer and the publications to different hosting services?

Sorry for my english not so good.

Its mean i use cloudfront signed url to parse epub at Load HTTP publication URL: http://dev.mydomain.net/uploads/Epub-Example-8-edited.epub?Expires=1540374118&Signature=ehkEQwnfLMnOMuKUOpgsC1cJz7PkPKazdoLJyPota0pFGKWjuxJ1MKbzLJ-YUAXAwOkFJdpnntYyxy9k3bIIxnZyppcWB5wAwvOaVIDQNIoOxvuM3n8764QttiyOleTLhjZSGDY0qpBLp5vJAE0pKRGl7332MtnhutfCapaG-0SYNcMYj5JtZL4WzjW2TzyH8OJlREGz-kciKxStQSSmm3IgT8S-qg04~PWTFCKbBXtEB-E-Cpk9LBi0cgJPL6-iRNWwInsIDP9wDDEtFT9UZhsM8vIgAcg9t5CGGHk8JnRcKU3EbBck-h9LOMXg1fTunaXUSZ01TtNF0oQdEiNQRA__&Key-Pair-Id=My-Key-Pair-Id

nguyenhuutuananh commented 6 years ago

Finally, I found out why streamer can't parse epub from cloudfront signed url.

https://github.com/readium/r2-shared-js/blob/develop/src/parser/publication-parser.ts#L18 const ext = path.extname(fileName).toLowerCase();

The path.extname function will detect all strings after dot, which will be extension of the file.

With the URL: http://dev.mydomain.net/uploads/Epub-Example-8-edited.epub?Expires=1540374118&Signature=abc__&Key-Pair-Id=My-Key-Pair-Id, the result will be: .epub?Expires=1540374118&Signature=abc__&Key-Pair-Id=My-Key-Pair-Id and /\.epub[3]?$/.test(ext) function return false.

I need to change this line const ext = path.extname(fileName).toLowerCase(); to const ext = path.extname(fileName.split("?").shift() || fileName).toLowerCase(); to fix my problem.

Thanks

danielweck commented 6 years ago

Ah, that's right, thanks for debugging this! We should definitely sanitize the path using a URL / URI utility, like we do in other instance in the R2 JS code. I will look into this.

danielweck commented 5 years ago

Fixed: https://github.com/readium/r2-shared-js/commit/d494c2eecac771d04ea7432c917357e1678d292c