FR Idea: Sync Zotero Annotations

ryanwwest commented 7 months ago

Zotero's Web API v3 allows syncing annotations, which in Zotero will include annotations for PDF, EPUB, and HTML (which is a pretty large subset of what KOReader allows for annotation too). Zotero's annotation offerings uses the open Web Annotation Data Model schema are actually quite similar to KOReader's - there is a start and stop position for a highlight, a copy of the highlighted text 'style' (colors for Zotero, vs. highlighted/underlined/strikethrough/invert for KOReader), and a few more fields. Both of them also support basic text notes which are both anchored to highlights.

I think there is sufficient feature overlap that there could be a translation layer that displays Zotero annotations as KOReader annotations when the Zotero attachment is opened in KOReader (after browsing and selecting). Eventually even modifying/adding/deleting annotations in KOReader and syncing the changes back to the Web API would be ideal, but read-only would be a first step. This would make annotations much more valuable as they are no longer isolated to one system. (Ideally, a universal file format based on WADM might be better, but two largely supported programs is better than one).

There are some issues to overcome with annotation translation:

KOReader highlights are positioned differently from Zotero's (e.g. different coordinate 0,0 orientation, different scale, more). I tried to figure this out a few months ago but didn't get there
KOReader highlight styles are primarily intended for black-and-white screens while Zotero highlights are only stylized by color (and RGB color is allowed, but the UI picker only shows ~9 options). My initial idea has been to map the first four preset Zotero colors to the four KOReader black-white-friendly options, but KOReader color highlighting may be added soon and partially solve this.
KOReader's bookmark and highlight structure in Lua docsettings are a bit confusing.
Zotero area selections and hovering notes (not anchored to text) don't have a corollary in KOReader (maybe add them as a KOReader FR in the future?). Technically neither do Zotero item notes (not attachment notes), but the latter one can probably be ignored. Zotero will probably add more features, and only some will be KOReader-recognizable.
Would need to decide how to deal with offline Zotero annotation storage in KOReader, though I think the plugin itself needs to be further developed first.

Here's an example set of highlights in Zotero on top, the same in KOReader on bottom, followed by the data structure used for the two Zotero highlights which I think is accessible via Web API:

I might be able to help build this at some point, but I don't currently have the time. I have the beginnings of a translation layer in Python but it's early on and needs to be in Lua anyway.

antrmn commented 4 months ago

KOReader's bookmark and highlight structure in Lua docsettings are a bit confusing.

I wonder if this is still true in the latest version of KOReader (2024.07)

CobriMediaJulien commented 2 months ago

i hope colored highlights come soon. with the new eink color displays that would be awesome

ryanwwest commented 2 months ago

@CobriMediaJulien I think the next KOReader release might add color highlights (https://github.com/koreader/koreader/issues/9024).

stelzch commented 1 month ago

I have made some progress with annotation support in the recent pre-release.

Annotations are only synced one way and edits/deletes are not supported.

mcrosson commented 1 month ago

Does the update also allow for sending any epub or html annotations to zotero or is it just limited to pdf?

Also: is there a way to prevent sync until we are 'done reading and annotating'? This approach would allow us to make annotations on-device and only send them once (at the 'end') to Zotero to help avoid any potential need to reconcile changes and deletes on the Zotero side of the sync.

mergen3107 commented 1 month ago

Thank you @stelzch !

Does that mean that newer uploads will overwrite existing uploads?

mergen3107 commented 1 month ago

And is the original PDF getting overwritten, or an upload is a separate PDF?

stelzch commented 1 month ago

@mcrosson

Does the update also allow for sending any epub or html annotations to zotero or is it just limited to pdf?

Currently it is limited to PDFs. I have not looked into how annotations of HTMLs are stored, neither in KOReader nor in Zotero, but if it can be easily supported I might look into it.

is there a way to prevent sync until we are 'done reading and annotating'?

The sync is triggered manually by choosing "Synchronize" in the Zotero menu, so the workflow you describe should be possible. For your convenience you can also assign a gesture to trigger the synchronization.

stelzch commented 1 month ago

is the original PDF getting overwritten, or an upload is a separate PDF?

The PDF files are not modified at all, annotations are read from the accompanying metadata file in the sdr file and created using the Zotero API. This means they only show up in Zotero's built-in PDF reader and not in an external PDF reader.

stelzch commented 1 month ago

The word upload might be a bit misleading here, I did not want to use 'synchronize' since that kind of implies accepting changes in both directions, if that makes sense.

ryanwwest commented 1 month ago

This means they only show up in Zotero's built-in PDF reader and not in an external PDF reader.

I prefer this approach. Original file stays unmodified, and in either Zotero or KOReader, the user still has the option to export the annotations to the file itself if they wish. As the files are otherwise unmodified, the file hash doesn't change so hash-based metadata storage in KOReader still works (https://github.com/koreader/koreader/issues/10892).

mcrosson commented 1 month ago

Currently it is limited to PDFs. I have not looked into how annotations of HTMLs are stored, neither in KOReader nor in Zotero, but if it can be easily supported I might look into it.

I could live without HTML for awhile, my bigger need would be ePub as I have a lot of ePub content I hope to annotate over time. Is ePub a format you'd be willing to look into?

The sync is triggered manually by choosing "Synchronize" in the Zotero menu, so the workflow you describe should be possible. For your convenience you can also assign a gesture to trigger the synchronization.

The menu is sufficient for my needs. Thank you for the clarity.

ryanwwest commented 1 month ago

I believe ePUB and HTML use the same annotation schema in KOReader, so by supporting one you support both (unless something has changed). PDF has a slightly different schema since it is a fixed layout.

stelzch commented 1 month ago

Is ePub a format you'd be willing to look into?

In principle yes, but I can not make any promises about when that may be.

mcrosson commented 1 month ago

Is ePub a format you'd be willing to look into?

In principle yes, but I can not make any promises about when that may be.

That's totally fair and valid. Thank you for providing info and insights related to this feature. I appreciate it.

anaxonda commented 2 weeks ago

@stelzch maybe this discussion on the Zotero forum can be of use: https://forums.zotero.org/discussion/comment/474458#Comment_474458.

stelzch commented 2 weeks ago

So the main problem boils down to KOReader using XPointer to represent annotation positions, whereas Zotero uses CFI. A conversion is only possible with the document loaded and parsed in memory.

Zotero seems to support different selector types already. If they added an XPointer selector to their codebase, supporting EPUB annotations should be easy.

stelzch / zotero.koplugin

FR Idea: Sync Zotero Annotations #13