Proposal: OPSS (Open Publication Synchronization Service)

Fmstrat commented 5 years ago

Hi everyone,

I'd like to make a proposal that may be of interest to you based on your backgrounds:

@janeczku - Author of Calibre-web @HadrienGardeur - Maintainer of this repo and part of the OPDS community @opds-community - The OPDS community @geometer - Author or FBReader

Hopefully I've defined you all appropriately, but if not, my apologies. I'm sure there are others that make sense to pull into this conversion, and hopefully those already mentioned can reference them here.

While the OPDS spec is widely used (http://opds-spec.org/), one thing that has not yet been standardized in open readers is synchronization of read status, notes, bookmarks, and highlights. Many reader applications have this features (including FBReader), but while open, they are proprietary in-built systems.

My proposal is the creation of OPSS (Open Publication Synchronization Service) to accompany OPDS and hopefully be integrated into many of the solutions offering OPDS.

Initial considerations:

Web based, to ensure compatibility with existing OPDS solutions
JSON or XML response to allow for future addition of fields without breaking backwards compatibility
Back-end storage of data is up to the software developer, but guidance around DB structure or on-filesystem structures could be provided
Guidance around authentication should be determined. (Shared user/pass with OPDS? API Key for public libraries? etc)
URL service location (always a subdirectory of OPDS to not require a separate URL?)
Should the spec allow for synchronization of items outside the OPDS library?

The following methods should be considered for implementation (naming is placeholder):

setposition: Sets the current and furthest read position for 1 pub
getposition: Gets the current and furthest read position 1 pub
getallpositions: Gets all positions for multiple pubs (for quick synchronization of all books in the reader at load time for apps)
getmeta: Gets both read positions, notes, bookmarks, and highlights 1 pub
setmeta: Sets both read positions, notes, bookmarks, and highlights for 1 pub

Anyone interested in pursuing this?

Add-ons Adding the below to the chain because I believe they are likely candidates for initial implementations (along with calibre-web from a server side) of a cross-platform solution (common platforms, actively developed, and FOSS).

@axet - Author of Book Reader @babluboy - Author of Bookwork

Relocated from: https://github.com/Feedbooks/opds-test-catalog/issues/1
Similar to https://github.com/opds-community/drafts/issues/27

Previously discussed with interest from @geometer (Author of FBReader)

(1) How to identify a book? Note that ePub is not the only book format.

Perhaps MD5 the file for a unique id?

(2) How to record a position?

Count the words up the first word displayed on screen. That becomes the starting line for any reader, maybe? That also allows us to calculate percent read, and pages by defining average number of words per screen.

HadrienGardeur commented 5 years ago

JSON or XML response to allow for future addition of fields without breaking backwards compatibility

I think such a spec should integrate with both OPDS 1.2 or 2.0 through a link, but I don't think that the "Synchronization Spec" would need both XML and JSON.

In 2019, I would simply default to JSON or JSON-LD.

Back-end storage of data is up to the software developer, but guidance around DB structure or on-filesystem structures could be provided

IMO that's all out of scope for such a document.

Guidance around authentication should be determined. (Shared user/pass with OPDS? API Key for public libraries? etc)

See https://drafts.opds.io/authentication-for-opds-1.0

URL service location (always a subdirectory of OPDS to not require a separate URL?)

I'm not sure what you're talking about with that one. If you mean a "well-known location", I don't think that's the way to go.

The synchronization service should be discoverable using a link.

Anyone interested in pursuing this?

NYPL has developed something like that for SimplyE that integrates with OPDS, might be worth having @leonardr chiming in as well.

axet commented 5 years ago

I'm using offline p2p synchronisation in my book reader. If you create compatible format json device to device synchronisation format I'm interested. Keep in mind synching across devices can create conflicts if book read by two devices and synchronisation software (create -conflict suffix files) and book reader (show conflict dialogs) should deal with it. I'm not interested in centralised only solution, not everyone want to setup a server.

But I store a bit more information about books such as:

{
  "created": 1539355003077,  // added to library
  "last": 1553022686211, // last viewed
  "title": "Fallout_ Equestria", // can be renamed by user
  "position": [
    95, // paragraph
    0, // element
    0 // char
  ],
  "fontsize_6e01be161064d066": 45, // per device id font size, optimal to have per every device
  "bookmarks": [
    {
      "last": 1553022479332, // last change time for syncronization
      "name": "", // user name
      "text": "A pony’s PipBuck generates an E.F.S. (Eyes-Forward Sparkle) that will indicate direction and help gauge whether the ponies or creatures around you are hostile.", // actual highlight text
      "color": 0,
      "start": [
        97,
        223,
        0
      ],
      "end": [
        97,
        269,
        8
      ]
    },
    {
      "last": 1553022485681,
      "name": "",
      "text": "And, perhaps most impressively, a PipBuck can magically aid you in a fight for brief periods of time through use of the S.A.T.S. (Stable-Tec Arcane Targeting Spell).",
      "color": 0,
      "start": [
        97,
        271,
        0
      ],
      "end": [
        97,
        323,
        7
      ]
    }
  ]
}

Fmstrat commented 5 years ago

@axet If you have a moment, I have a couple of questions about your implementation:

Out of curiosity, why did you choose to use paragraph, element, and char for positional look ups?
Is this position the top-left most character displayed on the current screen?
How are you identifying a unique book (or does that not matter due to the p2p nature of the tool)?

@HadrienGardeur

JSON or XML response to allow for future addition of fields without breaking backwards compatibility

I think such a spec should integrate with both OPDS 1.2 or 2.0 through a link, but I don't think that the "Synchronization Spec" would need both XML and JSON.

In 2019, I would simply default to JSON or JSON-LD.

Agreed, now that I've reviewed the 2.0 draft spec, JSON is definitely the way to go.

Back-end storage of data is up to the software developer, but guidance around DB structure or on-filesystem structures could be provided

IMO that's all out of scope for such a document.

Agreed.

Guidance around authentication should be determined. (Shared user/pass with OPDS? API Key for public libraries? etc)

See https://drafts.opds.io/authentication-for-opds-1.0

Makes sense to me.

URL service location (always a subdirectory of OPDS to not require a separate URL?)

I'm not sure what you're talking about with that one. If you mean a "well-known location", I don't think that's the way to go.

The synchronization service should be discoverable using a link.

I guess the foundation of the question is, should OPSS be a separate service with a separate discoverable URL, or integrated into OPDS in a standard location.

Anyone interested in pursuing this?

NYPL has developed something like that for SimplyE that integrates with OPDS, might be worth having @leonardr chiming in as well.

Would certainly love as much input from interested parties as possible.

HadrienGardeur commented 5 years ago

It's also important to point out the work we're doing at Readium on locators, which could be the building block for such a service: https://readium.org/architecture/locators/

leonardr commented 5 years ago

We use the Web Annotation Protocol to sync bookmarks and last reading position across devices. At a glance it covers all the use cases here, and it's a well-defined protocol with multiple independent implementations.

In particular, WAP defines a relation for discovery. Here's how we link to the annotation endpoint for a specific book in an OPDS 1.2 feed. Note the distinctive link relation and media type:

    <link href="https://circulation.librarysimplified.org/NYNYPL/annotations/Gutenberg%20ID/40278/" type="application/ld+json; profile=&quot;http://www.w3.org/ns/anno.jsonld&quot;" rel="http://www.w3.org/ns/oa#annotationService"/>

Since annotation documents are JSON-LD, I believe you can store whatever extra information you want to keep, bibliographic or otherwise.

The only extension we've had to make is a custom "motivation" (i.e. a reason why a person would create an annotation). We gave it the URI http://librarysimplified.org/terms/annotation/idling and it's used to indicate the last reading position -- a part of the book that is only special because it's the last thing that was on your screen when you closed the reader.

We don't promise to accept requests from any abitrary WAP client, but you can use WAP to extract all of your annotations from one of our servers.

As Hadrien points out (and I think Fmstrat is also getting at), another crucial piece of the puzzle that's currently missing is a reliable way to identify a specific place in a book. We use EPUB's Content Fragment Identifier but they're not very reliable and we're hoping for something better to come out of Readium's "locators" project.

opds-community / drafts

Proposal: OPSS (Open Publication Synchronization Service) #28

Similar to https://github.com/opds-community/drafts/issues/27