Representing a selector that indicates a page range of a book

gobengo commented 6 years ago

I have the following highlight that I would like to represent as OA:

{ title: 'Relativity (Albert Einstein)',
  details: 
   { type: 'highlight',
     page: { from: 77, to: 77 },
     location: { from: 1173, to: 1174 },
     time: 2017-09-17T18:28:04.000Z },
  snippet: 'In What Respects are the Foundations of Classical Mechanics and of the Special Theory of Relativity Unsatisfactory?' }

The most obvious selector to create is a TextQuoteSelector, which I have done.

But I also don't want to throw away the page range or location range here, each of which is refinedBy the TextQuoteSelector#exact. This 'snippet' content could in fact appear multiple times in the book, and it would be quite useful to be able to clarify which pages this particular instance of the snippet appeared on.

I was surprised to find that the OA document I quickly searched across don't necessarily ever represent the selector being across a range of pages in a book (even a pdf or something). But of course I could have missed a good example somewhere.

Any advice?

I am thinking I'd like the annotation target to have two selectors, one each for the page range and 'location' range, each would be refinedBy an exact TextQuoteSelector. Each of these ranges could be a RangeSelector with {start,end}Selector pointing to a logical 'Page in book' or 'Location in File'. The latter is left somewhat intentionally vague, but surely there is a good vocab term somewhere for 'BookPage'? @azaroth42 do you know of a good term? I couldn't find one on schema.org or in dc.

azaroth42 commented 6 years ago

PDF pages can be referenced with a FragmentSelector, see the example:

https://www.w3.org/TR/annotation-model/#fragment-selector

And pages in an EPUB could be referenced with a CFI fragment, in the same example table.

However to say that a Web Annotation annotates the content printed on a physical page crossed over the line into out of scope. And, frankly, this is not something that even library linked open data can handle very gracefully yet. In particular, does it annotate the abstract content or the exact phrase? Is it only in one edition of the book, or only one specific copy of the book? Is is exactly that physical area of the page, regardless of edition?

So my advice is to think long and hard about your exact use cases 😀 but more usefully if you want to use schema.org, you could have CreativeWorks that are partOf other CreativeWorks, and use position to manage the page numbering.

gobengo commented 6 years ago

@azaroth42 In this case, the highlights are from my Kindle so they could be on any sort of document: pdf, mobi, epub, etc. At this layer of abstraction I don't always know. But I'd like to find a way of also looking up more metadata about the underlying book, i.e. ISBN and stuff. Other than using the ISBN, there's not always a good URL to use to refer to the document.

CreativeWork + partOf + position gives me a great starting point to work with, though. Thanks for the quick response!

Also I will post code soon.

gobengo commented 6 years ago

@azaroth42 I have not made use of your advice yet but you can see the project here: https://github.com/gobengo/kindle-web-annotations

w3c / web-annotation

Representing a selector that indicates a page range of a book #436