johnfactotum / foliate

Read e-books in style
https://johnfactotum.github.io/foliate/
GNU General Public License v3.0
6.33k stars 290 forks source link

Add Annotation Marks #1101

Closed gmou3 closed 12 months ago

gmou3 commented 1 year ago

Hi johnfactotum,

I have added annotation marks at the navbar that appear at the bottom. My initial goal was to have them colorful and consistent with the chosen colors of the annotations. However, it seems that the library does not support colorful marks, so I placed them at the bottom to differentiate between section and annotation marks.

The annotations marks are not accurate and can be quite off because I get the location when I save them, instead of using the more accurate cfi (don't know how to transform it into a fraction).

Let me know what you think about this functionality, as I have other ideas I would also like to experiment with. Thanks!

johnfactotum commented 1 year ago

Yeah, it's not possible to turn CFIs into locations. In fact the precise location of any range within a section might fluctuate slightly depending on the viewport (though the total and the starting location of each section, which are based on byte sizes, would stay the same). Having to save an inaccurate location number with each annotation feels decidedly suboptimal.

One might be able to make locations precise if one could get the string offset of a range in a document but I'm not sure if that'd be possible with the current DOM APIs. Even if one somehow manages to do that, it's likely that it wouldn't be very performant, and it would still require fully loading the document, both of which means you still have to save the offset together with the annotation rather than being able to recompute them every time from the CFI, which is a bit less than ideal.

It would be a nice feature to have, but from a technical point of view, not really worth it IMHO.

gmou3 commented 1 year ago

I think I managed to do it with high precision by using the progress bar's get_value(). This seems to return a very precise fraction.

It still requires to save one extra entry for each annotation, but I don't see an issue with that. It probably can be avoided, but then, as you said, this may affect the performance, because you will have to do regular traversals of the document to locate the annotations through their cfi and then to get the fraction.

johnfactotum commented 1 year ago

Yes, the fraction is more precise than the location. But it's still the same thing. Just that locations are rounded according to the location size (currently 1500 bytes). It's calculated from the index of the visible column and the total number of columns in the rendered page (or the scroll offset in scrolled mode). This is only a rough number indicating the currently visible portion of the page, which is related to annotations only in that normally you can only add annotations to parts of the page that are visible (so you could end up with annotations that appear earlier in the book having greater progress fraction value than those that appear later if you resize the window). So it's doubly inaccurate.

gmou3 commented 1 year ago

Isn't this a practically negligible issue? I suppose almost everybody doesn't wildly change the zoom level. In fact saving the location instead of the cfi might be also preferable because it is more aligned with how you viewed the annotated text (i.e., it is probably more centered in the page instead of the top).

Anyway, you don't seem particularly excited about this functionality. But if you would like me to try doing it based on the cfi or based on some other recommendation please let me know.

johnfactotum commented 1 year ago

Isn't this a practically negligible issue? I suppose almost everybody doesn't wildly change the zoom level.

The whole point of reflowable formats is that they are reflowable by design. It's not really a good idea to assume that things won't be reflowed.

In fact saving the location instead of the cfi might be also preferable because it is more aligned with how you viewed the annotated text

The problem with locations is that the location is neither tied to the view nor the underlying HTML page. Rather it's a frankenstein locator that tries to approximate the byte offset through the view.

it is probably more centered in the page instead of the top

A "getVisibleRangeCenter" function or something similar will probably be added to the paginator at some point, which will be useful not only for centering anchors in scrolled mode, but will also help with showing page numbers on the verso side in paginated mode.

doing it based on the cfi or based on some other recommendation

If you could make it based on the CFI it would be far better than basing it on locations. But I just don't think that's practicable.

Doing this with Epub.js could be possible, as its locations are based on CFIs. But Epub.js has to load every section to generate the locations, which is slow. And to make it more precise you'd want to pick a small location size, which will make it even slower and the cache files bigger.

In foliate-js I chose to make it a rough estimate based on the sizes of the sections instead. This means that the calculation is instant and doesn't require caching the results. But the tradeoff is that locations are no longer exact, and shouldn't be used for anything other than giving you a very rough idea of your reading progress.

It might not be impossible to get the exact string offset of a given DOM range in a document. I think there are some DOM implementations with the ability to get the source string offsets of nodes, but as far as I know not with the Web APIs. That means you either have to write your own DOM parser (!), or you'd probably need to walk through each node to build a map from nodes to offsets and handle the offsets within a node carefully, accounting for numeric and entity character references. That's a lot of work just to show marks on the progress bar. (I suppose one could argue that the offsets could be useful for consuming the annotations in another reading system that doesn't support CFIs and doesn't even have a fully functioning DOM, but again, it's a lot of work for a very narrow if not nonexistent use case. And for this particular use case one would be better off using a standalone tool to convert from CFIs to offsets.)

Fundamentally, it's sort of a trilemma:

gmou3 commented 12 months ago

Thank you for the detailed response. Indeed saving inaccurate locations is ugly. I am trying a different approach where I visit all annotations through their CFIs, which enables me to get corresponding fractional values. This is only done upon initialization, the idea being that the fractions will only be updated when the user reopens a book.

The problem now is that this initialization step is time consuming if there exist many annotations. Ideally this should be implemented as a background process (using overlays?), so that the user doesn't have to wait any longer after the book is loaded.

johnfactotum commented 12 months ago

Getting the fraction when opening the book isn't really better because reflows can happen while the book is opened.

And the performance penalty of having to render the whole book is unacceptable, especially considering that this is a nice to have, but not essential, feature. In fact the only reason why I started writing my own EPUB library was that with Epub.js you must always load (not render, just load) the entire book on startup, which is slow and consumes a lot of memory.

(Rendering the whole book proved unacceptable even for Epub.js, which used to have a method to calculate the total number of pages, which requires rendering the whole book, but then they switched to using locations based CFIs, by generating CFIs for characters offsets in text nodes, which still requires loading, but not rendering every section.)

gmou3 commented 12 months ago

I think it is an important improvement to load the marks at startup, as you may reopen the book on a different screen or you may change your usual view settings. As I see it, it is acceptable to not be fully precise if you change the view in the same session. The performance penalty can be bad indeed (especially if one has lots of annotations), but -- given no solution -- I am still willing to accept it. Personally, I find this feature very nice because it embeds your notes in a way which is geometrically meaningful (in contrast to the list of the sidebar).

In any case, I will keep this feature in a personal fork, and you can of course feel free to close the PR. It is a bit unfortunate however that others won't enjoy this. Thank you for your time and for your efforts, I have been reading a lot more inspired by foliate!

johnfactotum commented 12 months ago

Thanks. I'll close this for now, then.