readium / readium-lcp-server

Repository for the Readium LCP Server
BSD 3-Clause "New" or "Revised" License
73 stars 58 forks source link

Some EPUB2 publications do not use proper NCX content-type in OPF => navigation document gets encrypted, not compliant with the LCP specification #236

Open danielweck opened 3 years ago

danielweck commented 3 years ago

The LCP specification: https://github.com/readium/lcp-specs/blob/master/releases/lcp/latest.md#21-encrypted-resources

In addition, this specification defines that the following files must not be encrypted: META-INF/license.lcpl. Navigation Documents referenced in any Package Document from the Publication (all Publication Resources listed in the Publication manifest with the "nav" property) NCX documents referenced in any Package Document from the Publication (all Publication Resources listed in the Publication manifest with the media type “application/x-dtbncx+xml”) Cover images (all Publication Resources listed in the Publication manifest with the "cover-image" property)

To reproduce:

LSD: https://lsd-test.edrlab.org/licenses/999806a1-62cd-4f95-a455-6fd9099929c7/status

LCP license: https://front-test.edrlab.org/api/v1/licenses/999806a1-62cd-4f95-a455-6fd9099929c7

EPUB download link: https://lcp-test.edrlab.org/contents/1957d6ca-12b3-4918-baa5-41810a2e9e8e

content.opf:

<manifest>
  <item id="ncx" href="toc.ncx" media-type="text/xml"/>
...
 </manifest>
 <spine toc="ncx">
...

Note media-type="text/xml" instead of application/x-dtbncx+xml => the Go code does not consider the manifest item as the NCX, goes on to encrypt the resource.

Code references:

https://github.com/readium/readium-lcp-server/blob/de6e2380aec0dad963a5dfb4bea044d116f53fd2/epub/reader.go#L170-L192

https://github.com/readium/readium-lcp-server/blob/de6e2380aec0dad963a5dfb4bea044d116f53fd2/epub/epub.go#L26

danielweck commented 3 years ago

Possible solution: discover the NCX via the spine@toc="ncx" indirection, which indirectly references the manifest item id="ncx". This feels heavy-handed, but this is a viable workaround for badly-authored EPUB2 publications (assuming media-type="application/x-dtbncx+xml" is mandatory for NCX, I must admit I haven't checked the specification).

Other solution: fix reading systems (e.g. Thorium, which currently fails to open the publication), by either ignoring the encrypted NCX, or by implementing a delayed parsing strategy (I think ReadiumSDK C/C++ code implemented an alternative codepath to wait for LCP passphrase before starting the parsing of core resources such as NCX/NavDoc or Cover Image)

Other solution: do nothing. Content creators / publishers have the responsibility to fix their publications (the reality is that many "legacy" publications exist and will never be updated).

Other solution: the Go code rejects such publication when it detects that there is an inconsistency

... or alternatively the Go code patches the incorrect media-type (bad idea I think, not just because it would break resource-level signatures, but also because this is not the responsibility of the LCP encryptor)

danielweck commented 3 years ago

Related issue: https://github.com/readium/readium-lcp-server/issues/129