hypothesis / client

The Hypothesis web-based annotation client.
Other
630 stars 196 forks source link

Investigate issue with out-of-sync VitalSource chapter titles #4986

Closed robertknight closed 1 year ago

robertknight commented 1 year ago

While working on https://github.com/hypothesis/client/pull/4985 found a book where the chapter headings returned by the VitalSource book viewer's internal APIs are out of sync after a certain point. The consequence is that chapter headings we capture with annotations as part of the EPUBContentSelector selector have the wrong value, and titles displayed above annotations (see https://github.com/hypothesis/client/pull/4985) are wrong. We need to figure out if this is a widespread problem and either work with VS to resolve it or work around it.

The table of contents displayed in the TOC tree is correct, and in sync with the actual content.

Details:

https://bookshelf.vitalsource.com/reader/books/9781319367077 is a book that was annotated heavily by one class that trialled the Hypothesis <-> VitalSource integration earlier in the year.

The <mosaic-book> element's getCurrentPage() API returns objects with a chapterTitle property. For the initial pages in the book, this property has a value that matches the actual content. Around the 8th/9th chapter, there are two successive chapters that have the same chapterTitle value ("Contents"), and thereafter the chapterTitle value is one entry behind where it should be (ie. it container the chapter title for the previous chapter than the one it is supposed to contain).

Slack thread: https://vitalsource.slack.com/archives/C01208U1A2F/p1668692597345219

robertknight commented 1 year ago

Here are the two successive entries in the output of <mosaic-book>'s getPages() API where the problem occurs. Note the repeated chapter title:

{
    "path": "/OEBPS/xhtml/lun_9781319056261_contents.xhtml",
    "absoluteURL": "/books/9781319367077/epub/OEBPS/xhtml/lun_9781319056261_contents.xhtml",
    "cfi": "/6/18[;vnd.vst.idref=fm8]",
    "cfiWithoutAssertions": "/6/18",
    "linear": true,
    "page": "xix",
    "chapterTitle": "Contents",
    "layout": null,
    "spread": null,
    "orientation": null,
    "pageSpread": null
},
{
    "path": "/OEBPS/xhtml/lun_9781319056261_part01.xhtml",
    "absoluteURL": "/books/9781319367077/epub/OEBPS/xhtml/lun_9781319056261_part01.xhtml",
    "cfi": "/6/20[;vnd.vst.idref=part01]",
    "cfiWithoutAssertions": "/6/20",
    "linear": true,
    "page": "1",
    "chapterTitle": "Contents",
    "layout": null,
    "spread": null,
    "orientation": null,
    "pageSpread": null
}
robertknight commented 1 year ago

In the above Slack thread we got a suggestion from Brett at VS to consider the getTOC() API of the <mosaic-book> element instead. If I understand correctly, the data source for that is also used by the table-of-contents panel in Bookshelf. It also gives us the ability to retrieve the hierarchy of section headings, which might be helpful to display in the sidebar.

robertknight commented 1 year ago

This was resolved in https://github.com/hypothesis/client/pull/5061.