omnivore-app / obsidian-omnivore

Obsidian plugin to fetch articles and highlights from Omnivore
MIT License
842 stars 82 forks source link

Highlight location not accurate #131

Open hanconel opened 1 year ago

hanconel commented 1 year ago

I am sending my Book highlights in PDF format to Obsidian. However, when I send the highlights to Obsidian they are not in the same order as in the PDF in Omnivore, even though I did specify in the settings that the Highlight Order should apply to the location of the highlights in the article. Some of the links jump to the cover page of the document.

MarkBouk commented 1 year ago

+1 to this issue. The issue seems to occur if I sync a PDF, add additional highlights to the document, then sync again. Many of the highlights appear at the beginning of the note instead of in the order that they appear in the PDF.

tprotopopescu commented 1 year ago

+1 too. I find that sorting by time saved also does not result in the right order. For instance the latest highlight I took shows up in the middle of the file rather than the end. As far as I can see this happens just with pdfs, I tried it out on highlights from an rss feed item and the reordering worked both ways (by resyncing everything).

sofianbello commented 1 year ago

+1 here.

I am just a hobbyist with barely any knowledge but I believe that the issue mainly relates to PDFs (or maybe external files in general) and is a bug in the main omnivore application and not one of the plugin. Highlights that originate from websites (articles and so on) more precisely hypertext seem to work fine. The Application handles the reordering of highlights that have been created at a later time but occur in previous text locations just fine.

It seems like the PDF-Reader in Omnivore works differently as it for example only has the option to create one highlight color, while with hypertext you can choose up to 4 highlight colors.

The best fix to the solution that I know of at this point is to carefully highlight and notes in order. In my experiments all highlights were synced in order until I either added a new highlight which occured in earlier text passages or notes after some of the highlights already existed.

Also after using ChatGPT to search for a solution I was finally provided with the following:

Given your concern about the order of highlights, especially in PDF documents, the issue might lie within how these functions interact or how they handle the text nodes and highlights' structure. The functions that deal with the 'patch' generation and application, as well as those handling the text node traversal (like getTextNodesBetween, generateDiffPatch, and selectionOffsetsFromPatch), could be crucial here.

If there's a bug affecting the order of highlights, it might be due to how these patches are applied, how text nodes are interpreted between the start and end points of highlights, or how highlights are constructed within the document's structure. Debugging should likely focus on these areas, particularly on the data being passed between these functions and the outputs they generate.

I hope this might help to resolve the issue.

Edit: Nothing changed, just grammar and spelling corrected.

antithetic commented 12 months ago

same with this issue. it seems to be affecting PDFs primarily !

tprotopopescu commented 10 months ago

After recent omnivore updates it seems that ordering by the time the highlights are updated works now works. On the other hand ordering by location of the highlights never gives the correct ordering. To reproduce upload a pdf, highlight a passage on page 1, 2 then 3 in that order. Ordering by time saved exports the highlights in that order. Ordering by location always exports them in a different order. This now happens on every pdf I have tried it on, whereas before it worked sometimes.

MiracleXYZ commented 9 months ago

+1 to this issue. The order is messed up especially in PDF documents.

0xstepit commented 9 months ago

+1 the issue is sill present

Tchecker67 commented 4 months ago

Any news ? The issue is still present

martin-kakazu commented 3 months ago

The issue is still exists. It also affects the Logseq plugin.

Chris-May-WS commented 1 month ago

The odd ordering of highlights in PDF files persists for me.

Tchecker67 commented 1 month ago

Still present here too. Thanks to ytaras who seems to try to solve it ! 👏

Chris-May-WS commented 1 month ago

Here, here! I realize one issue I might have is that there hasn't been a release in a few months. It may have been fixed, but I haven't been able to try to install this via a manual code download.