[FEATURE] KOReader sync implementation

avidal commented 11 months ago

The KOReader^koreader reader software is very popular. It supports a large number of e-readers, and is often used as an alternative reader on jailbroken Kindles. It can be natively installed on Kobo and Boox e-readers (or any e-reader that supports Android applications).

There are additionally other clients that can sync with a KOReader sync server, such as Obsidian^obsidian-ko, which allows you to annotate or highlight books in a KOReader client (such as your Kobo), and then pull those annotations into Obsidian.

This is all powered by a simple API. There is an official implementation^official-impl as well as numerous other implementations, such as a single-file Go implementation^go-impl and a Rust implementation^rust-impl.

I'm assuming there are general plans to support progress sync in Stump, and I'm also assuming that (currently) the first implementation would be based on pages fetched over OPDS-PS. Considering that implementation will need to come with some datastore for tracking progress, a koreader-sync implementation would be able to leverage that same datastore.

For reference purposes, the actual API consists of five endpoints (register user, check auth, update progress, get progress, and healthcheck)^routes and one JSON schema that is used both to update the progress on a document and to fetch the current progress given a document id. The JSON schema also includes a unique device id.

aaronleopold commented 11 months ago

Thanks for all of the references, it's really helpful and I appreciate it! This is really cool and I'm definitely interested in getting support for this at some point.

I'm assuming there are general plans to support progress sync in Stump, and I'm also assuming that (currently) the first implementation would be based on pages fetched over OPDS-PS. Considering that implementation will need to come with some datastore for tracking progress, a koreader-sync implementation would be able to leverage that same datastore.

There already is progress 'sync' support in Stump, both via OPDS-PS and then natively through the API. Whenever you fetch a page in ODPS-PS it will update the progress, as you suggested. Epubs are a bit different, though. Currently, epub progress is tracked via a combination of percentage completed and your current epubcfi string. Perhaps you strictly meant wrt koreader-sync, but I wanted to clarify just in case.

I'll try and review what would be involved on my end to support this sometime soon. Not sure how open you are to contributing, but I'd be happy to help get you going if that is of interest and you have the time.

avidal commented 10 months ago

re: progress support

Ah, I was looking for an implementation of that but couldn't quite find it. But now that I'm looking it's pretty obvious. I think I must have been looking at a different codebase (hard to juggle all of those open tabs).

As far as contribution goes, I'm pretty interested. I'm still getting my new setup going and will gone for the weekend but I may take a stab at it myself after that.

The general open question is: should the koreader sync API be built-in, or should there just be a stump-koreader-sync binary that communicates with the stump API? It's perhaps not a really important question to answer right now, but it may set the stage for how "integrations" are built in the future.

aaronleopold commented 10 months ago

As far as contribution goes, I'm pretty interested. I'm still getting my new setup going and will gone for the weekend but I may take a stab at it myself after that.

Sounds good, if you wind up giving it a shot feel free to give me a poke if you need anything and I'll try and help where I can

The general open question is: should the koreader sync API be built-in, or should there just be a stump-koreader-sync binary that communicates with the stump API?

Good question. It likely depends on how large the integration itself winds up being. A separate service just for this can maybe be more scalable, but not sure how much that really matters here. Regardless, I feel it should be constructed in a way to support isolated or integrated runs. A clean example of what I mean would be how the server can actually be bundled in and started from the desktop app.

avidal commented 10 months ago

A clean example of what I mean would be how the server can actually be bundled in and started from the desktop app.

Ah, right. My question wasn't really about scalability (at least from a resource perspective) but about maintainability. I forgot that this was a monorepo.

I think the most straightforward may be a crate that exposes endpoints, and there's a config switch that maps the endpoint implementations to the existing http muxer.

The separate crate could be both a library and a binary crate. The main stump crate could include the koreader-sync library by default. The koreader-sync binary crate could expose its own http interface.

Someone running a NAS may then choose to run stump without the sync and run the sync separately via its own binary.

Or something along those lines.

aaronleopold commented 10 months ago

Gotcha. Well, what you proposed sounds good to me! It would likely fit nicely in either the crates or apps directories of the monorepo, but it it makes sense to eventually break it out it isn't the end of the world (if anything, it would make isolated versioning and releasing less hassle down the road)

aaronleopold commented 1 month ago

I've been thinking about this one recently, and I'll likely take a shot at implementing it sometime in the next few months.

One thing that I wanted to ask generally after reviewing the discussion here is whether it really makes sense to develop the integration as (optionally) a separate binary? I understand your previous point re: setting precedence for future integrations, however for integrations developed and maintained in the repo itself I feel it might be more reasonable to bake it in. I can't think of a use case where someone who wants this integration would rather deploy a separate service. I could be wrong of course, I'll try and actually use Koreader properly before giving the implementation a go

aaronleopold commented 1 month ago

For those who might be following this issue, I've started development here. The API is relatively straightforward, and there are only two hiccups so far, one of which is a bit of a show stopper at the moment:

I am not 100% sure what progress refers to in the get/put progress payloads. It looks like it might be some reference to an HTML element, but isn't a valid epubcfi string
It looks like koreader sends an md5 hash of the password you provide to login. This is inherently a problem, stump does not and won't change to use the same hashing algorithm for passwords. I'm not really sure if there is an immediate work around besides knocking out https://github.com/stumpapp/stump/issues/447 first

I'll continue working on this here and there, but I feel I'll have to add the api key kind of concept first to unblock it

avidal commented 1 month ago

This all super well timed. I actually finally got myself a Kobo last week (work trip to Paris meant I could finally find one in-store to test) and the whole act of shifting ebooks from my Kindle to the Kobo got me thinking about Stump (and then this specific feature request) again.

As far as where the API integration goes, the main concern is around code organization right? Because let's say you get koreader integration, and then kobo-api (like calibre-web does), and then maybe another one, it's all about making sure these different integration layers don't result in an unmaintainable mess of conditionals layered through the source. Ultimately you'd know far better than I would the best layout for this particular codebase.

I am not 100% sure what progress refers to in the get/put progress payloads.

I think it's just generally going to be an opaque marker that only koreader knows what to do with. If you're looking for an opportunity to "normalize" progress between koreader and the browser reader (or other readers) I think the closest you'll get is using the percentage that koreader reports?

It looks like koreader sends an md5 hash of the password you provide to login.

Yeah. I think the best course here is to generate a URL as mentioned in #447 that encodes the identifying information and then just ignore whatever password the device provides. Technically your personal "sync url" would be leaked but is that really a problem? If so, generate a "token" when generating the sync URL, where the token is an md5 of some random short string. The app could show the user their full "credentials".

aaronleopold commented 1 month ago

This all super well timed. I actually finally got myself a Kobo last week

Nice! I keep meaning to get one but never get around to it.

As far as where the API integration goes, the main concern is around code organization right?

For the most part, for sure. After really giving the sync API a read through and seeing how small it is, I didn't want to get too bogged down architecting a solution for future integrations that may never get developed - I feel it can be easy to over-think and lose productivity at times. This isn't to say the example integration (kobo-api) you mentioned would never come! But I am okay with a bit of refactoring down the line if juggling multiple integrations becomes a serious organizational/maintenance problem.

If you're looking for an opportunity to "normalize" progress between koreader and the browser reader (or other readers) I think the closest you'll get is using the percentage

Yeah that's a bit unfortunate. It isn't a huge deal, but I thought it would be neat if I could get the progress to "fully" sync (i.e. you re-open a book in Stump and it is exactly where you were). Stump does track percentage, at least, but it uses primarily epubcfi strings for landing you back where you were for EPUBs.

I think the best course here is to generate a URL as mentioned in https://github.com/stumpapp/stump/issues/447 that encodes the identifying information and then just ignore whatever password the device provides

Yep! That's exactly what I've been doing 🙂 TBH the sync aspect is likely ready for initial testing, but I'm building out the API key management first since you need to be able to create one to use the sync endpoints

Sneak peek

Technically your personal "sync url" would be leaked but is that really a problem?

IMO, no. I'm building it so you can give explicit permissions to the generated token. So for example, by the time this feature is ready for merge you could just create an API key with one permission: feature:koreader_sync (actual value tbd). If that key got leaked, worst someone could do is read some books for you on their on koreader app haha. Obviously if you give it a bunch of permissions that would be a different story I guess

avidal commented 1 month ago

I am not 100% sure what progress refers to in the get/put progress payloads.

I did some digging in koreader/koreader. It looks like, generally speaking, for page-based documents, sync progress will be a page number. koreader also supports a "scrolling" viewer where you don't explicitly change pages in which case the sync progress is an xpointer, which I believe is really just an xpath that maps to the location in the DOM at the top of the screen at the time of sync.

A "complete" integration then could attempt to locate a page in the document by xpath (if the koreader sync receives an xpointer for progress) and use that to update progress by page and percent, but then you have to figure out how to translate if you read in stump and then load in scrolling mode in koreader.

I think it's broadly sufficient to only sync the actual progress between kosync and stump if it's an actual page number and otherwise store the submitted xpointer (with a callout in the docs that stump reader <-> koreader sync does not work if you use the scrolling reader).

Breadcrumbs:

Starting with the sync module, it calls getLastProgress to determine the progress value: https://github.com/koreader/koreader/blob/fdd342de4016bd8a068ee24ab403caad5249a259/plugins/kosync.koplugin/main.lua#L590

For page-based viewing, it'll call: https://github.com/koreader/koreader/blob/fdd342de4016bd8a068ee24ab403caad5249a259/frontend/apps/reader/modules/readerpaging.lua#L159

Which leads to getTopPage: https://github.com/koreader/koreader/blob/fdd342de4016bd8a068ee24ab403caad5249a259/frontend/apps/reader/modules/readerpaging.lua#L647-L654

Which effectively returns the page number of the page visible at the top of the screen.

For rolling/scrolling views: https://github.com/koreader/koreader/blob/fdd342de4016bd8a068ee24ab403caad5249a259/frontend/apps/reader/modules/readerrolling.lua#L407

Which returns self.xpointer, which itself is updated as you scroll or rotate the viewport. The following doc comment explains why xpointer is used for the scrolling reader: https://github.com/koreader/koreader/blob/fdd342de4016bd8a068ee24ab403caad5249a259/frontend/apps/reader/modules/readerrolling.lua#L36-L53

avidal commented 1 month ago

Also, it looks like the "rolling" viewer is a "View Mode" selectable for any document. The user guide calls it "Continuous Mode".

aaronleopold commented 1 month ago

It looks like, generally speaking, for page-based documents, sync progress will be a page number... [or for scroll] an xpointer

This is really helpful! Thanks for doing some digging and for providing me with links to follow along!

I hadn't heard of x-pointer before, down the road it might be feasible to convert it to an epubcfi or something but for the time being I'll probably just do as you suggested:

I think it's broadly sufficient to only sync the actual progress between kosync and stump if it's an actual page number and otherwise store the submitted xpointer

aaronleopold commented 1 month ago

I knocked out a good bit of the API key management UI while on the plane today. I think this is pretty close to getting a very untested and unstable image out. If that is something anyone would be willing to test, please react and/or respond to this message.

Obligatory sneak peeks

I haven't tested the actual sync handlers for the koreader router I've created, but the middleware picks up and validates API keys successfully which is the big barrier to testing the actual sync API

aaronleopold commented 1 month ago

Anyone wanting to test koreader out can pull the unstable image and give it a go. There is a ENABLE_KOREADER_SYNC env variable you'll need to set to true.

You'll also need to do a fresh scan on a library where the generate_koreader_hashes config is set

aaronleopold commented 1 month ago

I've minimally tested it and haven't hit any issues! The percentage KoReader sends seems to align with the progress tracking in the built-in reader which is good. An example of that progress value is /body/DocFragment[10]/body/p[53]/text().157.

arunoruto commented 4 weeks ago

For the most part, for sure. After really giving the sync API a read through and seeing how small it is, I didn't want to get too bogged down architecting a solution for future integrations that may never get developed - I feel it can be easy to over-think and lose productivity at times. This isn't to say the example integration (kobo-api) you mentioned would never come! But I am okay with a bit of refactoring down the line if juggling multiple integrations becomes a serious organizational/maintenance problem.

Sorry to chime in! As for the kobo-api, from #361, it seems like there is a rust implementation called kobink. There is not much documentation, but it should be a good starting point, if that feature ever is considered to be implemented!

aaronleopold commented 4 weeks ago

Sorry to chime in!

That's what these issues are for! 🙂

Thanks for sharing the example for the kobo api! I'm definitely selfishly interested in eventually adding kobo since I will at some point get one. If you don't mind, please feel free to create a separate issue for it to be tracked a bit more neatly/isolated

stumpapp / stump

[FEATURE] KOReader sync implementation #239