Closed sati-bodhi closed 1 year ago
I am working on a collective fix, which you can see in PR #51. Perhaps we can look at this issue you report here after we have merged it to the main branch.
You could help me by using the branch named dev/round-off and see if it changed anything for this problem. But you don't have to. I will leave the branch for a couple days for now to see if there is any glaring issue I have missed.
@nobiot I've forked your repo but can't find a way to access the dev/round-off
branch with it.
It happens with static ebook buffers which don't allow editing as well.
Have you managed to use dev/round-off branch?
I have a suspicion about what might be the cause; if it is correct, I have fixed it. The problem was the org-remark-save
terminates in the middle of dolist
with an error, and thus the highlights after the error won't be updated correctly.
I'm testing the file you provided above (https://github.com/nobiot/org-remark/issues/50#issuecomment-1374345403). I cannot reproduce the issue.
I'm also testing with a larger text file (Alice in Wonderland, text version, https://www.gutenberg.org/files/11/11-h/11-h.htm). It has 2698 lines. I have no problem.
Is there anything you do to make the problem happen?
Have you managed to use dev/round-off branch?
I wasn't able to test the new branch - git clone
downloads just the main branch for testing. I've forked your repo without unchecking the "main branch only" option.
I have yet to discover a systematic cause for the problem; but it happens to Chapter 11 of this ebook as well.
You can read it with nov using my pull request #54 version.
Here are my config.el
settings that work with the pull request commit.
;; compatibility with org-noter
(defun org-noter-get-epub-source ()
"Returns the path of the epub source from which the present session is initiated."
(if org-noter--session
(concat
(org-noter--session-property-text org-noter--session)
"/"
(file-name-base (cdr (aref nov-documents nov-documents-index))))
nil))
(add-to-list 'org-remark-source-find-file-name-functions 'org-noter-get-epub-source)
Highlights beyond "Infant Coping Behaviors" org-remark-beg: 5239
are consistently displaced by 40 points upon visiting a new chapter, adding notes and returning to it.
I have yet to discover a systematic cause for the problem; but it happens to Chapter 11 of this ebook as well.
You can read it with nov using my pull request #54 version. Here are my
config.el
settings that work with the pull request commit.;; compatibility with org-noter (defun org-noter-get-epub-source () "Returns the path of the epub source from which the present session is initiated." (if org-noter--session (concat (org-noter--session-property-text org-noter--session) "/" (file-name-base (cdr (aref nov-documents nov-documents-index)))) nil)) (add-to-list 'org-remark-source-find-file-name-functions 'org-noter-get-epub-source)
Highlights beyond "Infant Coping Behaviors"
org-remark-beg: 5239
are consistently displaced by 40 points upon visiting a new chapter, adding notes and returning to it.
This epub compatibility issue could be related, but could also be a different issue altogether. At least I am finally getting a (somewhat) consistent error at my side that is worth investigating.
@nobiot, I've tested out the new dev/round-off
branch with my work. The highlights have been stable thus far.
Kudos for your work!
Great. Thanks for reporting back and good to hear it has been stable for real use like yours. I will first merge the dev/round-off later this week without the nov.el
support first. As noted, Then I think we can work on yet another branch I just created dev/nov.el
(branching off of the latest dev/round-off
) just to contain the development efforts of nov.el
.
Sorry to move you from one to another; I appreciate the fact you are new to Git and GitHub -- I'm also not used to using it to work with others at such a high pace like this. But I think nov.el
support is not too far off after round-off
.
@sati-bodhi I think I now know highlights are displaced only after "Infant Coping Behaviors" org-remark-beg: 5239.
It's the <table>
tag that underlying HTML uses for TABLE 11.1 Infant Coping Behaviors.
Try having a narrow frame of Emacs and then open the same document in nov.el
.
The program pads spaces when it renders a table in different ways according to the size of the frame, so the positions beyond the place where a table is rendered is not consistent if you change the viewing size...
Compare the two images below. I have turned on whitespace-mode
to see the impact of padding but probably unnecessary. Just look at the line number displayed on the left fringe for the first raw of the table where you see "1. SIGNAL: Infant acts in a way that function to modify mother’s behavior".
Line number 68 vs 113. The points at the end of the raw are different; this should be obvious as the narrow screen has more lines.
This explains the reason why only the highlights after this table gets displaced.
I am taking these screen shots on the latest dev/nov.el
branch, which incorporates all the fixes in the dev/round-off
branch.
I am not sure what would be the best course of action for this. Perhaps we might need to resort to storing and checking the beginning and ending strings to automatically adjust the points as you suggested before. At the moment, my capacity is running out.
For now, you would need to read epub books in a consistent Emacs frame size to keep the highlights consistent -- I believe that's what you have been doing as you say highlights have been consistently displayed thus far.
As nov.el
uses shr.el
to render HTML, I believe this is the relevant function:
;; Table rendering is the only complicated thing here. We do this by
;; first counting how many TDs there are in each TR, and registering
;; how wide they think they should be ("width=45%", etc). Then we
;; render each TD separately (this is done in temporary buffers, so
;; that we can use all the rendering machinery as if we were in the
;; main buffer). Now we know how much space each TD really takes, so
;; we then render everything again with the new widths, and finally
;; insert all these boxes into the main buffer.
(defun shr-tag-table-1 (dom)
I think I have now know out highlights are displaced only after "Infant Coping Behaviors"
Yes, I've recently noticed this as well.
When I set nov-text-width
to 80 and below, it slightly displaces the highlights under that table by 1 point - nothing as drastic as before, so I guess we can make-do with the present set up at it is.
Wonder how Calibre marks and keep record of its highlights with epubcfi
behind the scene. Here's the a single record in the annot_data
column of the annotations
table in metadata.db
.
annot_data | {"end_cfi": "/2/4/224/2/1:29", "highlighted_text": "Beyond the Pleasure Principle", "notes": "Beyond the Pleasure Principle lies the Death Instinct.", "spine_index": 11, "spine_name": "text/part0010.html", "start_cfi": "/2/4/224/2/1:0", "style": {"kind": "color", "type": "builtin", "which": "yellow"}, "timestamp": "2022-10-05T06:22:08.295Z", "toc_family_titles": ["Part One: Origins", "2 Sigmund Freud: The Drive/Structure Model"], "type": "highlight", "uuid": "0AMX_848HXmFJRNANJGFew"}
My guess is the last 2 digits in start_cfi
and end_cfi
represent the y
(i.e. line number) and x
(i.e. column number, a single character each) positions in the rendered epub document, which should be affected by the above situation as well, without some form of remedy with the code that does the processing.
Curiously, besides highlighted_text
above, there is a separate searchable text
column which documents the same string.
searchable_text | Beyond the Pleasure Principle
This design seem to fit my conjecture about
storing and checking the beginning and ending strings to automatically adjust the points
over here https://github.com/nobiot/org-remark/issues/50#issue-1523394821
You can use the current dev/nov.el
branch and try out the new feature to let Org-remark automatically adjust the position of highlights based on the new "org-remark-original-text" property.
You'd need to do two things:
org-remark-nov-mode
minor modeFirst, I have introduced the minor mode and all the technical set up is hidden away from the end user within it. Loading the file does not change the behaviour of the system. I do the following; you can do something similar
(add-to-list 'load-path (expand-file-name "org-remark" my/repos))
(require 'org-remark-global-tracking)
(org-remark-global-tracking-mode +1)
(with-eval-after-load 'nov
(require 'org-remark-nov)
(org-remark-nov-mode +1))
Second, "org-remark-original-text" property is added to the notes only when you create a new notes. For your existing notes in Chapter 11 after the table, you'd need to manually add the original text prop.
The logic should work even when a text spreads across two lines.
But it does not smartly look at different styles of quotation mark. In your notes that you gave me, you have "infant’s coping behavior" highlighted, but the heading text has "Infant's Coping Behaviour" with a different apostrophe; you would need to have the exact match to the body of the text in the nov-mode
buffer
Second, "org-remark-original-text" property is added to the notes only when you create a new notes. For your existing notes in Chapter 11 after the table, you'd need to manually add the original text prop.
Wonder if it is feasible to combine org-remark-link
and org-remark-original-text
into a single property which holds the values from both as a single link: [[nov:/home/sati/Calibre Library/Ed Tronick/The Neurobehavioral and Social-Emotional Development of Infants and Children (Norton Series on (1040)/The Neurobehavioral and Social-Emotional D - Ed Tronick.epub::17:58794][Infant's coping behavior]]
.
The link-label (original value of org-remark-original-text
) can be extracted with:
(let ((s "[[link-target][link-label]]"))
(string-match org-bracket-link-regexp s)
(match-string 3 s))
Presently, the default label given by org-insert-link
is a very long string that does not provide any new or critical information we need: EPUB file at /path/to/file.epub
.
Replacing it with the highlighted search text would make the properties drawer less cluttered and more concise - assuming users don't tend to make highlights that are too long; long strings do not look good on headings also.
To keep things clean, we could truncate the search string to no more than, say, 10 words for exceptional cases. But this may complicate matters, especially when working with CJK languages where word boundaries are not marked by spaces.
Wonder if it is feasible to combine
org-remark-link
andorg-remark-original-text
into a single property which holds the values from both as a single link:
My current hunch is that they should be separate properties. Link descriptions can be overridden by the user, and probably should be. I think having a separate, dedicated original-text
property makes the purpose explicit and clear (after documentation).
Presently, the default label given by
org-insert-link
is a very long string that does not provide any new or critical information we needReplacing it with the highlighted search text would make the properties drawer less cluttered and more concise
This is a good point and I will consider the suggestion of truncating the string. I think this can be done independent of the original-text
discussion above.
I personally have many highlights with a long string (a whole paragraph, for instance). This hasn't bothered me but your suggestion makes sense.
Usually, I hide the property drawers so "clutter" does not really bother me when I'm working with the marginal notes... But that's just my way. I am happy to see a real use case that's different from mine.
You can use the current
dev/nov.el
branch and try out the new feature to let Org-remark automatically adjust the position of highlights based on the new "org-remark-original-text" property.You'd need to do two things:
* Enable `org-remark-nov-mode` minor mode * Manually add "org-remark-original-text" property to the notes
First, I have introduced the minor mode and all the technical set up is hidden away from the end user within it. Loading the file does not change the behaviour of the system. I do the following; you can do something similar
(add-to-list 'load-path (expand-file-name "org-remark" my/repos)) (require 'org-remark-global-tracking) (org-remark-global-tracking-mode +1) (with-eval-after-load 'nov (require 'org-remark-nov) (org-remark-nov-mode +1))
Second, "org-remark-original-text" property is added to the notes only when you create a new notes. For your existing notes in Chapter 11 after the table, you'd need to manually add the original text prop.
The logic should work even when a text spreads across two lines.
But it does not smartly look at different styles of quotation mark. In your notes that you gave me, you have "infant’s coping behavior" highlighted, but the heading text has "Infant's Coping Behaviour" with a different apostrophe; you would need to have the exact match to the body of the text in the
nov-mode
buffer
It happened that the text after the table somehow got displaced again, this time by 70 points. Is it sufficient to just set org-remark-original-text
to the target, or do I have to manually adjust org-remark-beg
and org-remark-end
as well?
If it is temporary adjustment, original-text should be enough. Org-remark looks for the string two paragraphs before and after to adjust the location.
Changing beg+end is a “permanent” move — moving the base location, if you will.
If it is temporary adjustment, original-text should be enough. Org-remark looks for the string two paragraphs before and after to adjust the location.
Changing beg+end is a “permanent” move — moving the base location, if you will.
What happens if there are multiple instances of the same string in that region?
It will temporarily move to the first match found.
The position adjustment as described above has been implemented and released on v1.2.1. Closing this issue.
Though I am unable to come up with a reproducible scenario for this bug upfront, I've had experience of highlights being displaced when a relatively long (>600 lines) document is edited.
I have worked on similar issues before and the solution (or hack) I came up with is to perform a search for the precise string in question on the line or relevant region itself to double-check start-end positions whenever the document is edited.
This would probably take up (significantly) more computing resources, especially when highlights are many. But it fixes the problem.
I have not figured out why relative regions run awry easily, even when changes are traceable and meticulously synchronized.