weirdNox / org-noter

Emacs document annotator, using Org-mode
GNU General Public License v3.0
1.07k stars 101 forks source link

Incremental extraction of highlights #94

Closed nanjigen closed 1 year ago

nanjigen commented 4 years ago

I am trying to implement an 'incremental reading' system using org-noter, org-brain and org-drill (as well as Anki).

Currently I read a given PDF, highlighting portions of the text I want extracted. I then extract these highlights using org-noter-create-skeleton and add a :drill: tag to the subsequent tree. I org-drill these items, slowly whittling them down and eventually exporting them to Anki.

This process works well initially, when that first extraction occurs with org-noter-create-skeleton. However subsequent extractions create new skeletons with all previous highlights, and I need to dig through the tree to find the new highlights.

I wonder if its possible to extract only highlights that aren't already in the org-file and append them to the end of the tree? To throw a spanner in the works I'm using @fuxialexander 's org-pdftools and their pdf-notes-booster branch, which gives precise locations, as @weirdNox of course already knows.

UndeadKernel commented 4 years ago

I was just looking into something like this: I want to add an annotation to an opened PDF and then have a way to import that annotation into org noter.

I was thinking of implementing the new function org-noter-sync-annots, which would only export annotations that are not present in the headline under the point. A simple way to do this would be to modify org-noter-create-skeleton to also add to the annotations' property drawer the id returned by the function pdf-info-getannots. This id is supposed to be unique to all annotations in the same PDF.

This way, we can check if, for all annotations, an item with a property using its id is present or not. When the id is not present, we can just then export the new annotation (or somehow add it sorted). This also has the advantage of not requiring to change anything in pdf-tools.

I can probably program this but I was wondering if this is something that you @weirdNox would be interested in having. If yes, I can work on a PR.

weirdNox commented 4 years ago

Hello there! Sorry for "ghosting" this project; org-noter reached a point for me that is very stable and has all the features I need (I use it every day!), and I don't really have much free time anyway...

With that said, @UndeadKernel if you feel that you can do it, go ahead and suggest a pull request! The basic functionality of dumb syncing is easy enough, like you said. However, you mentioned that you would also use org-noter-create-skeleton which is annotations->org, besides doing the export org->annotations. If you really want to sync both ways, there may be some problems, as you could run into data loss due to overwrites somewhere (ie. if both the org heading and the annotation change before syncing again). Maybe you can use the diffing utilities Emacs already has builtin.

Also, I believe that this what issue https://github.com/weirdNox/org-noter/issues/27 is about, so it would be (at least) 2 issues with a single pull request! :D

Ypot commented 4 years ago

I was just looking into something like this: I want to add an annotation to an opened PDF and then have a way to import that annotation into org noter.

I was thinking of implementing the new function org-noter-sync-annots, which would only export annotations that are not present in the headline under the point. A simple way to do this would be to modify org-noter-create-skeleton to also add to the annotations' property drawer the id returned by the function pdf-info-getannots. This id is supposed to be unique to all annotations in the same PDF.

This way, we can check if, for all annotations, an item with a property using its id is present or not. When the id is not present, we can just then export the new annotation (or somehow add it sorted). This also has the advantage of not requiring to change anything in pdf-tools.

I can probably program this but I was wondering if this is something that you @weirdNox would be interested in having. If yes, I can work on a PR.

Could it be possible to have annotation/highlights grouped by page? It's a bit annoying to have each highlight of a page in different headings. Or maybe group them in quarters of the page if wanted to use precise note location.