readium / r2-shared-js

Shared models for Readium Desktop
BSD 3-Clause "New" or "Revised" License
11 stars 11 forks source link

Add support for extracting page map from Adobe Digital Editions ePUBs #15

Closed RobertHodan closed 5 years ago

RobertHodan commented 5 years ago

The purpose of this PR is to enable page map extraction from Adobe Digital Editions EPUBs.

ADE uses its own extension to EPUB to include a page map, as a stand-alone XML file.

This is important to Evident Point and our friends at NYU as their NYU press books use this scheme to map the publications with physical page numbers. There are other publishers out there that may have created content that have also used this.

More information can be found here:

jccr commented 5 years ago

@danielweck ,

@RobertHodan and I work together on this :)

danielweck commented 5 years ago

Thanks :) I've just noticed that there is no "sanity check" for publication.PageList being populated (array .push()) first from EPUB3 NavDoc, then EPUB2 NCX, then AdobePageMap ... each time, "pages" are naively appended to the list.

I guess the order of precedence should be: NavDoc wins over NCX, which wins over AdobePageMap? What do you think?

NavDoc (fillTOCFromNavDoc()): https://github.com/readium/r2-shared-js/blob/90a96259ea07deb2924afdd7eb0ae56c81c33d22/src/parser/epub.ts#L1786

NCX (fillPageListFromNCX()): https://github.com/readium/r2-shared-js/blob/90a96259ea07deb2924afdd7eb0ae56c81c33d22/src/parser/epub.ts#L1547

Adobe (fillPageListFromPageMapXML): https://github.com/readium/r2-shared-js/blob/90a96259ea07deb2924afdd7eb0ae56c81c33d22/src/parser/epub.ts#L1585

As you can see, the order is NavDoc -> NCX -> AdobePageMap right now.

https://github.com/readium/r2-shared-js/blob/90a96259ea07deb2924afdd7eb0ae56c81c33d22/src/parser/epub.ts#L540-L558

jccr commented 5 years ago

I guess the order of precedence should be: NavDoc wins over NCX, which wins over AdobePageMap? What do you think?

I agree with this. It should work in a way that if there is a navdoc, but no page list, then it falls through to the next in the order.

RobertHodan commented 5 years ago

Thanks for the suggestion! I've added if conditions checking for the absence of PageList, as well as renaming the new method to hopefully make it a bit more clear.