hypothesis / client

The Hypothesis web-based annotation client.
Other
630 stars 196 forks source link

Chapter titles are duplicated in VitalSource PDF-based books #5064

Closed robertknight closed 1 year ago

robertknight commented 1 year ago

The Hypothesis sidebar currently groups annotations in VitalSource books by content document CFI and displays one title per group. For PDF-based books each page is treated by VS as a separate content document, and so gets a separate group and heading in the sidebar.

This screenshot from https://bookshelf.vitalsource.com/reader/books/9781412993517 shows that annotating different pages in the same book chapter shows the same heading multiple times in the sidebar:

VS PDF book
robertknight commented 1 year ago

The data returned by the getCurrentPage API looks like this. Note the repeated chapter titles:

PDF book getPages data
robertknight commented 1 year ago

Some possible solutions:

  1. Record separately a CFI for the content document where the annotation was made, which will mean the page in a PDF-based book, and a CFI for the table of contents entry that corresponds to the current location. In EPUB-based books, there can be multiple TOC entries for the same content document. In PDF-based books there can be multiple content documents for the same TOC entry.
  2. Hide headings which have the same text as the previous heading