noembryo / KoHighlights

KOHighlights is a utility for viewing KOReader's highlights and/or export them to simple text, html, csv or markdown files.
MIT License
143 stars 5 forks source link

Hightlight page number inconsistency #38

Closed edo-jan closed 2 weeks ago

edo-jan commented 1 month ago

Context: As we discussed previously, koreader's page numbering is dependent on the device resolution / font sizes, etc (unless it uses publisher provided page numbers). While waiting for a permanent solution by the devs I have implemented a "logical page count" user patch from 2-logical-page-count.lua
This works well across all devices tested (i.e. under android, kobo and linux) Also, the devs implemented markdown export based on the logical page numbers (see the closed issue at the bottom of the page). Current issue: When I sync my annotations the page numbers in Kohighlights are all over the place while they are being displayed correctly (i.e. with the same logical numbers) across my devices

Enhancement requested: I wonder if it would be possible to implement the logical page number support in Kohighlights and use this option for markdown export.

noembryo commented 1 month ago

Current issue: When I sync my annotations the page numbers in Kohighlights are all over the place while they are being displayed correctly (i.e. with the same logical numbers) across my devices

I don't think that this is the case anymore. There are a couple of things changed since our last conversation.

As for the user patch, I don't understand what that code snippet does. There is no logic that I can recognize there since I never used a user patch, but I suspect that there is some code hidden somewhere that I can't find. Please, enlighten me..

edo-jan commented 1 month ago

I don't think that this is the case anymore. There are a couple of things changed since our last conversation.

You are correct. I was looking at the old archived view. The pages are now displayed in order. However the numbers are different from those displayed by the devices with the user patch turned on.

As for the user patch, I don't understand what that code snippet does. There is no logic that I can recognize there since I never used a user patch, but I suspect that there is some code hidden somewhere that I can't find. Please, enlighten me..

I wish I could. Unfortunately, I don't know much about the patch code other than the fact that it works ... lol... Maybe we can ask the guy who developed it. My issue is that the pages displayed in Kohighlights views are different from those displayed by the devices. If we only could harmonize them

noembryo commented 1 month ago

However the numbers are different from those displayed by the devices with the user patch turned on.

I can't help with this, since I don't know how the patch work.

My issue is that the pages displayed in Kohighlights views are different from those displayed by the devices. If we only could harmonize them

.. but this, I can't understand. You're saying that after the sync and after at least one opening with the device, the highlights page number are still wrong if you see them with KOHighlights? This can not be the case, since, as I told you, KOReader recalculates the page numbers..

edo-jan commented 4 weeks ago

My issue is that the pages displayed in Kohighlights views are different from those displayed by the devices. If we only could harmonize them

.. but this, I can't understand. You're saying that after the sync and after at least one opening with the device, the highlights page number are still wrong if you see them with KOHighlights? This can not be the case, since, as I told you, KOReader recalculates the page numbers..

Let me make it a bit more clear. What I meant is that Kohighlights displays page numbers which are different from those displayed by the devices with the user patch installed and the feature "use reference page numbers" is checked. These "reference" page numbers are exactly the same across all my devices and the book marks also display the "reference" pages for each bookmark. Now when I Ioad the .sdr into kohighlights it shows a different page numbers for the highlights / annotations.

edo-jan commented 4 weeks ago

here is the screenshot illustrating the above pages2024-08-14_003139

noembryo commented 4 weeks ago

Yes, I see. But unfortunately, since I don't know how the get these logical page numbers, I'm afraid I can't help you with this.. 😔

edo-jan commented 4 weeks ago

Let me try to find out more information about this. On a related note, I remember there was some talk about the Koreader's devs considering something similar to logical page numbers : Do you know if there's been any movement on this?

edo-jan commented 3 weeks ago

hi noembrio, I contacted the author of the user patch and he responded (please read it here). I hope you will find the information he provides to be helpful.

noembryo commented 3 weeks ago

Hmm.. Unfortunately, that means that it uses a KOReader's function, that uses the CRengine to get the text of the book and calculate the page based on that. The only way to have these numbers is for KOReader to add them to the annotation's info (in the book's metadata).

edo-jan commented 3 weeks ago

A big thanks for trying! I have a silly question though: am I beating a dead horse? Is it possible that this was already solved by some other method and I am (being not too technical) simply failed to realize this? Is there a way to have consistent page numbers in different devices / screen sizes / resolutions that are exportable to kohighlights by using some other method?

noembryo commented 3 weeks ago

Is there a way to have consistent page numbers in different devices / screen sizes / resolutions that are exportable to kohighlights by using some other method?

Searching for this new logical (synthetic) pages, I read many conversations that are asking the same question. I think the answer is "none that I know of".

The only fast and simple way that I know, is the Adobe Digital Editions (a page for each 1024 bytes of the book's file size). This is used in many readers for "fixed" page numbers, but it is highly inaccurate (more images in the book, more pages are calculated). I think that the ADE method is not supported in KOReader, and I believe rightly so.. KOReader's method is much better (more realistic) but I think only KOReader use it.

I asked a dev if it's possible to add the calculated pages in the metadata, and I'm waiting for an answer, so maybe there is hope for the future 🥳 (or maybe not 🤷‍♂️).

edo-jan commented 3 weeks ago

🙏

noembryo commented 3 weeks ago

If you're not following the thread with the request, check this answer..

noembryo commented 3 weeks ago

Can you check your metadata files (or even better upload them here), that have been processed with the user patch that you're using, if they contain a pageref key for every annotation, that contains the correct page number, the one that you see inside KOReader?

edo-jan commented 3 weeks ago

Can you check your metadata files (or even better upload them here), that have been processed with the user patch that you're using, if they contain a pageref key for every annotation, that contains the correct page number, the one that you see inside KOReader?

The pageref keys are there and they contain the calculated reference page numbers! 🙏😍😍🙏I am attaching the metadata file (the same book used in the screenshots above). You can see that the pageref numbers are the same as those shown on the bookmark view of the phone's screenshot metadata.epub.lua.zip

noembryo commented 3 weeks ago

Here is a beta version with reference page number support. Check if there is anything problematic about it..

edo-jan commented 3 weeks ago

Thanks for implementing! I can see the reference numbers are correctly showing once the "Reference Numbers" is selected. The only issue I can see so far is the incorrect total page numbers displayed in the metadata under the Book view (the "Pages" field located next to the "Language" field). For example, the book for which I sent you the metadata file above shows total number of pages in all my readers as 2494. Koghilight's "Pages" field for the same book shows 9600.

noembryo commented 3 weeks ago

Yes, I saw that too, but unfortunately there is no info for that in the metadata file..

edo-jan commented 3 weeks ago

As a [not too perfect] workaround consider hiding the "Page" field if the "Reference Numbers" option is checked. Just an idea

edo-jan commented 3 weeks ago

Here is a couple of other small issues.

  1. Note the screen artefacts during the start up
  2. The book list in the archive is not loaded during the start up. One needs to change the view from "Archived" to "Loaded" and back to "Archived" in order to see the books

https://github.com/user-attachments/assets/70f02543-daef-42a1-bb89-fb959e61653d

noembryo commented 3 weeks ago

As a [not too perfect] workaround consider hiding the "Page" field if the "Reference Numbers" option is checked. Just an idea

I can do that for now, but I don't like it that much.

  • Note the screen artefacts during the start up

Fixed.

  • The book list in the archive is not loaded during the start up. One needs to change the view from "Archived" to "Loaded" and back to "Archived" in order to see the books

Fixed.

New beta here..

edo-jan commented 3 weeks ago

Thank you we are almost there! 2.0.4.2 issues: Issue 1. The startup always defaults to the highlight view. The desired behavior is to remember the last view. E.g. if the program was closed in Book/Archived view, then it should open from there. Likewise, if was closed in Sync view should open in sync view.

Issue 2. The hilight view columns can be re-arranged but are not memorized. E.g. I like to have the "Page" to be the first column, followed by Chapter, Highlight, and Comment. After the program restarts it is back to the default.

Issue 3. The filter state under the Highlight view is not remembered. The desired behaviour would be to remember the last filter(s) applied and show the "Filter" button depressed that would indicate that there are some filters applied.

noembryo commented 3 weeks ago

Thank you we are almost there!

🤣🤣🤣🤣🤣

2.0.4.2 issues: Issue 1. The startup always defaults to the highlight view. The desired behavior is to remember the last view. E.g. if the program was closed in Book/Archived view, then it should open from there. Likewise, if was closed in Sync view should open in sync view.

And this is how it works, …except for the Sync View. The reason I disabled it there, is because, depending on the number of the groups you have, it can take some time until they are fully loaded. I will enable it for this beta to test it a little and see if I can leave it enabled..

Issue 2. The hilight view columns can be re-arranged but are not memorized. E.g. I like to have the "Page" to be the first column, followed by Chapter, Highlight, and Comment. After the program restarts it is back to the default.

Yes, I have thought about it when I first made the Highlights view too, and it was on the todo list for the future, but.. OK, done. 😉

Issue 3. The filter state under the Highlight view is not remembered. The desired behaviour would be to remember the last filter(s) applied and show the "Filter" button depressed that would indicate that there are some filters applied.

This is not how the filter works. The filter dialog is the same for all views and it gets reset when changing a view because it works in a different way depending on the view. When the dialog is open, the Filter button is pressed. When the filter dialog is closed, no filtering is applied and the Filter button is depressed. That way, there is always a visible indicator for when filtering is happening.

The newer beta is here..

edo-jan commented 3 weeks ago

Thank you we are almost there!

🤣🤣🤣🤣🤣

🤩🤩🤩

I will test the newer beta later tonight. In the meantime, I have observed that the path specified in the Archive view sometimes changes for no reason. Here is how you can replicate it as far as I can tell: Preconditions: a sync group exists for a book on two diferrent devices (e.g. the first .sdr is on C:/ and the second is on F:/ - i.e. a reader such as Kobo )
1) Go to the Books/Loaded view and load the .sdr from C:/ drive. 2) Archive the book (under the Archived view this should correctly show the path pointing to C:) 3) Go to the Sync view, make sure "Sync with archived" is checked and run the sync for the group 4) Eject the reader, restart Kohighlights 5) Now go to the Books/Archived. The path for the archived book changes to F:/ drive and a warning triangle sign appears with a note "epub file is missing".

The difficulty is that this does not happen 100% of the time but eventually many of my sync groups end up showing the missing file warning for at least one or two books. I noticed that if there is no backup file pre-existed before the first "sync with archive" the system creates Archive pointing to the connected device (in my case the F:/ drive). When the device is no longer connected it then shows red. I am not too clear why the path to the "Archived" shows the device directory in the first place. It is not where the Archived file is stored - am I wrong?

noembryo commented 3 weeks ago

From the top of my head, when archiving after a sync, the version of the book that is copied over to the db is the most recent one. So, if the most recently modified (for adding/editing annotation in the reader) version is the one at F:\, then this is the one that is archived.. I'll check the code later, to be sure..

EDIT: Yes, it's the newest one that is archived.

edo-jan commented 3 weeks ago

EDIT: Yes, it's the newest one that is archived.

Is there a reason to display the error about the missing file then? Or even the entire path given that this is not where the real archive is stored?

noembryo commented 3 weeks ago

There is not one "real archive" when you're syncing multiple files. All are real. There is no way that the app can predict that you're going to disconnect one drive in the future, so it will store something else in the archive. It archives the newest, because this one has probably the most updated info in general. The path is displayed as info for the user that it can be ignored if the user does not want to open the book itself.

edo-jan commented 3 weeks ago

Ok, now I understand how it works. Do you think in the future (one day maybe) it would make sense to make the system create a separate reference file (i.e. a real archive or better call it a system copy) which would store the latest sync version on a book level? Among other benefits this would enable actually deleting highlights / annotations from all [edit: devices, not books] at once, when the user edits entries directly in the system copy. Currently, if I delete an entry from one book the syncing brings it right back... lol.

noembryo commented 3 weeks ago

Do you think in the future (one day maybe) it would make sense to make the system create a separate reference file (i.e. a real archive or better call it a system copy) which would store the latest sync version on a book level?

By "book level" you mean storing the book as well? For what purpose?

Among other benefits this would enable actually deleting highlights / annotations from all books at once, when the user edits entries directly in the system copy. Currently, if I delete an entry from one book the syncing brings it right back... lol.

I don't think that this will ever be possible, because when syncing two versions of a book, and one has an annotation that the other doesn't, there is no way for the app to know that the first one had this annotation at a previous time (say yesterday) and you since then deleted it, so it should delete the annotation in the version that currently has it.

edo-jan commented 3 weeks ago

Do you think in the future (one day maybe) it would make sense to make the system create a separate reference file (i.e. a real archive or better call it a system copy) which would store the latest sync version on a book level?

By "book level" you mean storing the book as well? For what purpose?

Sorry I did not make it clear. I meant for every sync group. That is the system would create and store "system copies" of the metadata files containing the latest synched highlights / annotations for each book (that is one metadata file for each sync group).

Among other benefits this would enable actually deleting highlights / annotations from all books at once, when the user edits entries directly in the system copy. Currently, if I delete an entry from one book the syncing brings it right back... lol.

I don't think that this will ever be possible, because when syncing two versions of a book, and one has an annotation that the other doesn't, there is no way for the app to know that the first one had this annotation at a previous time (say yesterday) and you since then deleted it, so it should delete the annotation in the version that currently has it.

This is correct. But the idea is not to alter the current behavior. It is to add an editable system copy (e.g. to be editable in the Highlights view) which can be included into the sync group with an option to overwrite the metadata files on selected devices within the sync group. If one checks the "replace with system copy" option for specific device(s) in a group, that device's metadata file will be overwritten by the system copy. If one keeps the option unchecked (the default) the system copy will just store the latest version of the synced highlights for the sync group.

edo-jan commented 3 weeks ago

Moreover, this would enable a true cross device annotation editor one could use to edit / enhance annotations for the given book before exporting to an external app (such as Obsidian) for further processing. The resulting system copy could be used to overwrite the unedited versions on the devices...

noembryo commented 3 weeks ago

Sorry I did not make it clear. I meant for every sync group. That is the system would create and store "system copies" of the metadata files containing the latest synched highlights / annotations for each book (that is one metadata file for each sync group).

This is exactly what archiving does. It stores a copy of the .sdr metadata to the database.

But the idea is not to alter the current behavior. It is to add an editable system copy (e.g. to be editable in the Highlights view) which can be included into the sync group with an option to overwrite the metadata files on selected devices within the sync group. If one checks the "replace with system copy" option for specific device(s) in a group, that device's metadata file will be overwritten by the system copy. If one keeps the option unchecked (the default) the system copy will just store the latest version of the synced highlights for the sync group.

There is a dangerous idea there.. 💣, but an idea never the less. What you're proposing is an option to copy the archived metadata to all the items of a sync group. This is not that difficult to implement..

noembryo commented 3 weeks ago

I will have to think about the implementation though, and it will probably have to wait for the next update, since after this update, I will be away from my PC for a while..

edo-jan commented 3 weeks ago

I will have to think about the implementation though, and it will probably have to wait for the next update, since after this update, I will be away from my PC for a while..

Thank you! This would really boost usability. In the meantime I am thinking of putting together a basic user guide describing some of the ways one can use Kohighlights. In my view many people out there have no idea that this tool exists / how it can address their needs.

noembryo commented 3 weeks ago

OK. I caved in and add it to the group's right click menu.. Tomorrow I'll release it (if I didn't broke anything else with all that changes.. :).

noembryo commented 3 weeks ago

In the meantime I am thinking of putting together a basic user guide describing some of the ways one can use Kohighlights. In my view many people out there have no idea that this tool exists / how it can address their needs.

That's very nice..

noembryo commented 3 weeks ago

Forgot to post the link to the new beta..

edo-jan commented 2 weeks ago

I have created a test group and trying to add a book which is not being recognized. It is displayed in red color, but it is the same book but for whatever reason (based on a hash value?) it is being rejected. This is not the first time I am struggling with "false positives" likely because (this is my wild guess) it might have been opened in a Calibre viewer which if I am not mistaken stores in the epub some kind of metadata. Is there a simple way to fix this type of issues?

edo-jan commented 2 weeks ago

Just discovered that this happens when a book's settings in Koreader is changed via "reset document's settings to default" After this setting is applied to a book in Koreader then its metadata file loaded in Kohighlights is no longer being recognized as a legit member of its sync group.

edo-jan commented 2 weeks ago

In both cases the the unrecognized book is the one on the C:/ drive. In the group below it causes the other two to go red because it appears first in its sync group.
2024-08-22_232140

edo-jan commented 2 weeks ago

2024-08-22_232608

noembryo commented 2 weeks ago

This is not the first time I am struggling with "false positives" likely because (this is my wild guess) it might have been opened in a Calibre viewer which if I am not mistaken stores in the epub some kind of metadata. Is there a simple way to fix this type of issues?

There's a setting in Calibre, to prevent the reader to store the current position inside the epub file (that's what is doing). This is the only way to keep the file unmodified. Of course, you can modify the book by editing it inside Calibre, but you would know that. After the Calibre modifies the epub, there is no way that anyone can tell what modification happened (if it's a serious one or not).

Just discovered that this happens when a book's settings in Koreader is changed via "reset document's settings to default" After this setting is applied to a book in Koreader then its metadata file loaded in Kohighlights is no longer being recognized as a legit member of its sync group.

This action just deletes the .sdr folder and then re-creates it, using the book. That means that it will also recalculate the partial_md5_checksum that defines the book's ID. If the book was previously been modified, the new md5 will be different from the one before, hence the problem with pairing with the other versions of the metadata. You can check it yourself. If it's the same, then you must send me the problematic metadata, along with at least one of the normal ones..

edo-jan commented 2 weeks ago

Just discovered that this happens when a book's settings in Koreader is changed via "reset document's settings to default" After this setting is applied to a book in Koreader then its metadata file loaded in Kohighlights is no longer being recognized as a legit member of its sync group.

This action just deletes the .sdr folder and then re-creates it, using the book. That means that it will also recalculate the partial_md5_checksum that defines the book's ID. If the book was previously been modified, the new md5 will be different from the one before, hence the problem with pairing with the other versions of the metadata. You can check it yourself. If it's the same, then you must send me the problematic metadata, along with at least one of the normal ones..

Are you sure if the .sdr is deleted and recreated from the book? When the "reset document's settings to default" is used on a book all highlights and annotations are preserved. It simply applies a bunch of settings such as font size, margins, settings pertaining to status bars, etc. The name of this feature is somewhat misleading but it is used in conjunction with "Save document settings as default" - see the screenshot above. For example, when I tweak the settings on a book to my liking, I use the latter feature to create a multi-setting default that I can use on other books if I want them to look exactly the same as my tweaked one. As I said, all pre-existing highlights are preserved intact.

I just restored the impacted metadata file from backup (attached as "C-original").
When I load the group with "C-original" the sync group is recognized as legit. "C-altered" contains .sdr that resulted when the book was opened in Koreader and a couple of formatting settings were changed (specifically I just changed the screen orientation). This imediately caused the .sdr to be rejected. However, the underlying epub was not altered in any way (also attach it just in case). The two other files are from two other devices (one mapped to T drive and the other to E-drive - they are attached in respective folders) sdrs.zip

noembryo commented 2 weeks ago

I just restored the impacted metadata file from backup (attached as "C-original"). When I load the group with "C-original" the sync group is recognized as legit. "C-altered" contains .sdr that resulted when the book was opened in Koreader and a couple of formatting settings were changed (specifically I just changed the screen orientation).

This is not the case. "C-original" is identical to "C-altered"

edo-jan commented 2 weeks ago

Somehow I messed it up. Here it is. This one is larger.
C-altered2.zip

edo-jan commented 2 weeks ago

To test this further, I deleted the book's .sdr on C:/ and reopened the book to recreate it to see if the new .sdr will be accepted by the other group members (E & T). It is also rejected! Please see if the following group works on your side: C-new + E + T

C-new.zip

noembryo commented 2 weeks ago

OK, this one has the problem you're describing. An the reason for it is simple.

As you know, there was a big update for KOReader (and KOHighlights), that changed the way they stored the highlights in the metadata file. They created the "new" format of metadata. Metadata that where created before this update were the "old" format metadata. Now, the problem.. When you open an old format with the updated version of KOReader and you reset the metadata, it changes the metadata format from old to new. This makes the metadata incompatible with the other, old type metadata. KOHighlights can't sync/merge different types together. It supports both the old and the new, but not mixed up.

Another question that I would like to know, is what happens if you have this old format, open it with a current version KOReader and then create a new highlight. Will it convert all the highlights to the new format (and make it incompatible with the others) or write it in the old format? If you do something like this, please send me the resulting file.. 🙏

edo-jan commented 2 weeks ago

I have opened the book on all three devices and applied the "reset document's settings to default" to force the conversion to the new format and added some new comments as requested. Here is the good outcome: the books are now are all being recognized as a group. However, the bad one is that the sync stopped working. When I try to sync nothing happens (that is the no-confirm bug is back). Here are all three files TEMP.zip

noembryo commented 2 weeks ago

Well, they sync OK here..

edo-jan commented 2 weeks ago

The book synced after I removed from the archive the catched version. But another book is not being recognized.... no luck with this one. However here is a larger fish to fry. It appears that pageref tag is missing from most books. I think that it is in fact present only where the publisher included it into the epub. The logical numbers user patch works correctly, and its output can be seen in the Reference page number list, as well as correctly displayed in place of page numbers. However it is not making its way into pageref for most books I initiate with the newer version of Koreader don't show the pageref key in the metadata file...