nobiot / org-transclusion

Emacs package to enable transclusion with Org Mode
https://nobiot.github.io/org-transclusion/
GNU General Public License v3.0
906 stars 43 forks source link

Infinite loop when saving org file #177

Open pp5x opened 1 year ago

pp5x commented 1 year ago

What?

It seems org-transclusion has a bug when saving file and rendering. When the file is saved for the first time on a freshly launched emacs : no issue.

Then when I try to save the file a second (without modification) : emacs goes in an infinite loop and becomes unresponsive.

I managed to make it stop by using pkill --signal USR2 emacs and got a backtrace of what it was doing (see screenshot). The backtrace seems to show an interaction with org-element / org-transclusion was writing an infinite amount of time #+transclude: (see the number of lines. My file is originally 200 lines long). Looks like it's related to the way org-transclusion is saving files (that was mentioned in #109)

I suspect the bug is a bad interaction with org-element--cache-active-p which grows very very quickly. I noticed that running org-element-cache-reset can help when the file is opened again. Emacs also becomes unresponsive when i just move the cursor around the #+transclude: lines... :boom:

Screenshot from 2023-03-24 01-12-26

Doom Emacs

I am running doom-emacs:

generated  mars 24, 2023 01:26:09
system     NixOS 22.11.3299.9ef6e7727f4 (Raccoon) Linux 5.15.103 x86_64
emacs      28.2 ~/.dotfiles/emacs/.emacs.d/
doom       3.0.0-pre PROFILE=_@0 HEAD, master 4e105a95a 2023-03-22 18:29:38 -0400 ~/.doom.d/
shell      /run/current-system/sw/bin/bash
features   CAIRO DBUS FREETYPE GIF GLIB GMP GNUTLS GSETTINGS HARFBUZZ JPEG JSON LIBOTF
      LIBSELINUX LIBSYSTEMD LIBXML2 M17N_FLT MODULES NATIVE_COMP NOTIFY INOTIFY PDUMPER
      PNG RSVG SECCOMP SOUND THREADS TIFF TOOLKIT_SCROLL_BARS X11 XAW3D XDBE XIM XPM
      LUCID ZLIB
traits     batch server-running custom-file
modules    :config use-package :completion company vertico :ui doom doom-dashboard doom-quit
      (emoji +unicode) hl-todo modeline ophints (popup +defaults) treemacs vc-gutter
      vi-tilde-fringe workspaces zen :editor (evil +everywhere) file-templates fold
      (format +onsave) snippets :emacs dired electric undo vc :term vterm :checkers
      syntax (spell +flyspell) grammar :tools direnv editorconfig (eval +overlay) lookup
      lsp magit make :lang (cc +lsp) emacs-lisp markdown nix (org +journal +pretty) rst
      sh yaml :config (default +bindings +smartparens)
packages   (org-auto-tangle) (gift-mode :recipe (:host github :repo csrhodes/gift-mode :files
      (gift-mode.el))) (org-transclusion :recipe (:host github :repo
      nobiot/org-transclusion :branch main :files (*.el)))
elpa       vterm

Is there any chance we could solve this issue? I would love to be able to use this package to design my programming courses! Thanks!

nobiot commented 1 year ago

I’d love to fix the issue. Two questions:

I suspect there is something specific to Doom Emacs. I do not use it and I don’t know how to use it

nobiot commented 1 year ago

Also do you use the latest commit of org-transclusion? I can see you use the main branch but I do not see which commit.

nobiot commented 1 year ago

I tried doom emacs. I am not sure if this is helpful, but I cannot reproduce the issue. Sorry, it's really hard for me to understand what is going on on your end.

devcarbon-com commented 1 year ago

I've ran into this a couple times myself. I don't use doom, so I don't think it's exclusive to that.

I don't know exactly what caused it, but I suspect it had something to do with undo. (Or maybe undo-fu in particular).

If I can find a way to trigger this consistently I'll let you know.

nobiot commented 1 year ago

I don't know exactly what caused it, but I suspect it had something to do with undo. (Or maybe undo-fu in particular).

If I can find a way to trigger this consistently I'll let you know.

Thank you. I really struggle with this error; I'd really appreciate it if we can find a reproducible procedure.

I don't use undo-fu either.

I have encountered the error when I was developing features. For my experience, it was caused by the incorrect way transclusion is removed and the origianl #+transclude: line is inserted back (related to the beg/end marks). I think I have managed to eliminate all the cases I have encountered -- this is reflected in the current code.

pp5x commented 1 year ago

Hi @nobiot Sorry I did not try to dig more in this issue. I would need to setup a blank emacs with the faulty file I've been using. If I can, I'll for sure try to improve this issue by identifying the reproduction steps.

I wanted to let you know of the issue, which is for now too problematic for me to adopt transclusion in my org production (as I have to almost kill emacs to get out of the infinite loop). I'd love to see this package included in doom distribution, it's a great community which could help you get lots of traction (they are on Discord to help).

To answer your previous question : I was on main 'up to date' 2d0502f6bd4b422dcb34d1ae99257d9456fc54f4.

devcarbon-com commented 1 year ago

@nobiot Could you point me to where you in the code (or the commit) you fixed this occurring on the occasions you know about?

I'm thinking to debug this we could throw in a check that sees if the buffer size is growing since save was triggered and if so throw an error so we can debug-on-error instead of freezing emacs.

devcarbon-com commented 1 year ago

So looking a little closer this should be simple to do. We already have org-transclusion-before-save-buffer and I can throw the check into org-transclusion-remove to capture local variables and experiment to help find the cause.

nobiot commented 1 year ago

In order to contain the issue of infinite loop, I have pushed commit 43c478c. I'd appreciate it if anyone has bumped into the error message: "org-transclusion: Aborting. You may be in an infinite loop".

fix: heuristics to identify & break infinite loop on save

Reported in GitHub issues #109 #177.

I cannot reproduce the issue myself so far. I am put in place (1) small preventive measure and (2) heuristics to defect and break the infinite loop on save-buffer.

(1) Org-transclusion (OT) tries not to save the transcluded buffer content and instead save only the #+transclude keyword line to the file. To achieve this, OT uses 'before-' and 'after-save-hook' to remove-all the transclusions and then add-all them. This operation relies on the returned value of the point from 'org-transclusion-remove' function. In this commit, the point (integer) is changed to marker. This way, any arbitrary buffer change between these remove-all and add-all processes can have less impact on the moving points of reference -- makers automatically move to adopt to the new buffer state. I suspect something like 'whitespace-cleanup` put in 'before-save-buffer' might dislocate the positions in some situations. This preventive measure hopefully preempt the issues.

(2) The heuristics is simple but should work if there is an unexpected number loop happens. Since it is simply compare the length of a list, and the 'dolist' loops for the same list, logically this should be redundant; however, since the infinite loop itself to me is anomaly, this heuristics might catch the issue and break the loop.

As you can see, both attempts are not based on causal analysis but rather "stabbing in the dark" heuristics.

japhir commented 7 months ago

Hi nobiot, thanks for the great package!

I'm also running into this issue on 1.3.2, likely because I use evil-mode and have set: (add-hook 'evil-insert-state-exit-hook 'my-save-if-bufferfilename)

Right now it's unfortunately unusable for me, as it repeats the links infinitely and I can't break out other than killing the emacs session. How can I try out this fix to see if it helps?

nobiot commented 7 months ago

@japhir Thank you.

I don't use evil mode and I can't reproduce the issue any longer.

The key is to reproduce the issue reliably so that we can analyze the code that causes the infinite loop.

What I can suggest is:

  1. Remove 'my-save-if-bufferfilename from the hook and see if you can reproduce the issue reliably

  2. Remove evil mode and see if you can reliably reproduce the issue reliably

  3. Remove everything but evil mode and org-transcluion ans see if you can reproduce the issue reliably

I have begun to suspect there may be some insistency with evil mode... but I can't use vim keybinding so I am no use here.

devcarbon-com commented 7 months ago

@nobiot I've observed this behavior a few times, and I'm not using evil.

It seems to trigger when undoing just the right amount in the org file with the transclusions and then save.

I'll see if I can reproduce with emacs -Q.

nobiot commented 7 months ago

@devcarbon-com Thank you. I see, undoing never occurred to me. I am really curious to see if anyone can repro reliably with emacs -q...

Have you seen the infinite loop after commit 43c478c I mentioned above? It was merged in May 2023.

devcarbon-com commented 7 months ago

It was after that, only a couple months ago, but I'm not entirely sure if my version was current at the time.

devcarbon-com commented 7 months ago

Okay, I just reproduced this bug.

Emacs version:

GNU Emacs 29.1 (build 1, aarch64-apple-darwin21.6.0, NS appkit-2113.60
 Version 12.6.6 (Build 21G646)) of 2023-08-16

org-transclusion version: 1.3.2 installed via package-install after fresh install of emacs.

org file:

* OT test
#+transclude: [[./code.el::bar][bar]] :thing-at-point defun  :src elisp

code.el:

(defun bar ()
  (interactive)
  (message "hello"))

Steps to reproduce:

  1. Enable OT mode via org-transclusion-mode.
  2. org-transclusion-add with cursor at transclusion link.
  3. Save buffer.
  4. Undo exactly once.
  5. Save buffer.
  6. Bug is triggered, upon c-g org file looks like this:
#+transclude: [[./code.el::bar][bar]]  :src elisp
... (many more identical lines)
#+transclude: [[./code.el::bar][bar]]  :src elisp
#+begin_src elisp
(defun bob ()
  (interactive)
  (message "hello"))
#+end_src
devcarbon-com commented 7 months ago

(edited to be accurate)

nobiot commented 7 months ago

Steps to reproduce:

1. Enable OT mode via `org-transclusion-mode`.

2. `org-transclusion-add` with cursor at transclusion link.

3. Save buffer.

4. Undo exactly once.

5. Save buffer.

6. Bug is triggered, upon `c-g` org file looks like this:

@devcarbon-com Thank you for the steps. It's the first time I have seen a concrete step to repro. I feel one step closer to know what's really happening.

Now I have tried following the steps with emacs -q. I cannot follow exactly your steps (see below). I tried a little differently; no infinite loop triggered.

  1. In terminal, type emacs -q. Emacs opens a splash screen
  2. Press "q" to move to the "scratch" buffer.
  3. CTRL-Y to yank the following setup script and evaluate the buffer
(add-to-list 'load-path "~/.config/emacs/elpa/org-transclusion-1.3.2.0.20230819.63913")
(load-library "org-transclusion")
(define-key global-map (kbd "<f12>") #'org-transclusion-add)
(define-key global-map (kbd "C-z") #'undo)
  1. C-x C-f to visit the test org file (infinite-loop.org) image

    infinite-loop.org is identical with yours. code.el is identical with yours. setup.el is the setup script above to record the setup.

  2. M-x org-transclusion-mode. This already adds the transclusion (default, expected behaviour). Calling org-transclusion-add does not do anything

  3. Step 5 does not flag the buffer to "modified"; you cannot save the buffer (no change) -- this is expected

  4. C-z to call undo exactly once. The transclusion gets removed.

  5. You cannot save the buffer because there is nothing registered in undo (expected).

I cannot exactly follow your steps, so I tried this:

In the buffer org-transclusion-mode is already active. Transclusion has been removed.

  1. Change " OT Test" to " OT Test-changed" so that the buffer is marked as modified (ensure the buffer-modified is "on" while do the rest)
  2. Move the cursor to the line of #+transclusion.
  3. F12 (org-transclusion-add) to add
  4. Undo exactly once; the transclusion gets removed
  5. Ensure the modified-buffer has not changed and buffer-save (C-x C-s)

I cannot reproduce the infinite loop.

GNU Emacs 29.1.50 (build 1, x86_64-pc-linux-gnu, GTK+ Version 3.24.33, cairo version 1.16.0) of 2023-07-30

Org mode version 9.6.6 (release_9.6.6 @ /usr/local/share/emacs/29.1.50/lisp/org/)

I use Ubuntu with Wayland (default), so I compile my own Emacs with pgtk (pure gtk) feature on -- instead of the default gtk for Xorg
nobiot commented 7 months ago
  1. Enable OT mode via org-transclusion-mode.
  2. org-transclusion-add with cursor at transclusion link.

You can do these steps only when you have customizing org-transclusion-add-all-on-activate to nil. The default is t. Did you do the steps with emacs -q or emacs -Q?

devcarbon-com commented 7 months ago

Ah, yes, I see where I was not clear now. Following your steps, I get the exact same behavior as you do, and do not get an infinite loop. The key difference seems to be using a file that is already written, vs. writing one from scratch. Perhaps also the timing of activating org-transclusion-mode. Undo amalgamation may also pay a factor. At first I could not get consistent results, until I added an 'intentional mistake'.

  1. start emacs with emacs -q (sorry, -Q was by mistake earlier, not the command I ran).
  2. q from splash to scratch buffer.
  3. paste in your setup code and eval-buffer.
  4. find-file to new .org file.
  5. new file is empty. Enable org-transclusion-mode.
  6. type in (not paste) contents of inifinite-loop.org, but intentionally "forget" the colon after #+transclude.
  7. org-transclusion-add. (user-error: Not at a transclude keyword or transclusion in a block at point). (intentional).
  8. Add colon.
  9. org-transclusion-add.
  10. save-buffer.
  11. undo. (If transclusion disappears, error will not be triggered. I get this when I don't add the intentional mistake at times).
  12. save-buffer.
  13. Loop is triggered, cancel via c-g
devcarbon-com commented 7 months ago

Note that saving the buffer in step 10 seems to also be a key step. If you leave this out, it works without issue. (no loop, and transclution is removed on undo.)

nobiot commented 7 months ago

@devcarbon-com Thank you for the detail! I can reproduce the infinite loop now. I need to spend some time to get my head around it, though. It's a great leap forward!

japhir commented 7 months ago

Omg, devcarbon-com managed to reproduce exactly what I was doing at the time that I encountered the crash, but I just didn't recall it anymore!

short description of what I was doing at the time I was working on some data analysis in R, using org-mode and org-babel. At some point I started to accumulate a few too many functions, so I tangled the functions to separate files in the R subdirectory, so I could make a package out of it. After that, I wanted to work in those R files directly to make debugging and potential duplication errors easier to deal with. But I also liked having direct access to the functions from my org-mode file, so I wanted to transclude each of the functions that I had just tangled to a separate file. It was the first time in a long time using org-transclusion, so I typed out the new `#+transclude [[file:R/a_function_for_this_R_package.R]] :src R` lines manually. Then realized I had forgotten the colon, and added them with some automatic intermediate saving. Maybe I hit undo at an unfortunate time and this caused the crash. I would not have been able to make a reproducible example but your description triggered my memory :smile:
nobiot commented 4 months ago

FYI With #177, I have added another workaround to minimize the possibility of infinite loop occurring.

The workaround is this:

This workaround is motivated by the observation that the infinite loop issue happens mostly because of the tiny misspelling by missing the colon ":". The hope is to minimize the experience of infinite loop.

-- I realize it's not fixing the root cause. I have gone back to it but now I can only intermittently reproduce the problem (but more reliably than before, but not always). When I see the infinite loop happening, I am not able to determine the root cause... It would be great if anyone out there has experience in fixing this type of issue in Emacs and can help us.

akashpal-21 commented 3 months ago

Minimal procedure to consistently get the bug:

emacs -q -D

(add-to-list 'load-path "~/Desktop/transclusion/org-transclusion")
(require 'org-transclusion)

(defun test-payload ()
  (interactive)
  (insert "#+transclude: [[file:source.org]]")
  (org-transclusion-add)
  )

(global-set-key (kbd "C-c d") #'test-payload)

Create two files "source.org" containing basic structure to transclude and target.org - blank file execute #'test-payload

save undo (transclusion is not removed)

'org-transclusion-remove -- see problem.

Source of problem: after-save-hook org-transclusion-after-save-buffer

Need to undo twice instead of once.

Very rare to trigger and only occurs if undo after save.

Cannot replicate using undo-tree package.

Somehow the dolist inside the after-save-hook is messing with the default undo protocols --

Why infinite recursion?

org-transclusion-remove running without end because of

org-transclusion-before-save-buffer ()
  ...
        (org-transclusion-remove-all)))
akashpal-21 commented 3 months ago

Further note -- bug condition is so rare that emacs -q is not sufficient. Need emacs -q -D.

Nobody should under any normal circumstances trip upon this bug, possibly triggered by users when testing the program's functionality. Requires very rare alignment of known circumstances.

Possible fix:

  1. No fix (best solution)
  2. Create known issue of doing undo twice after save if required to remove the transclusion properly instead once if ever bug occurs. Under normal circumstances -- this bug will never be triggered.
nobiot commented 3 months ago

@akashpal-21 Incredible! Thanks a million! I don't think I can spend much time week days this week and this weekend may be a bit tight, but I want to squeeze some time to look closer at this!!

nobiot commented 3 months ago

Maybe, just maybe, you could also suggest a fix 🙏

akashpal-21 commented 3 months ago

I can give some ideas -- but I have a dire skill issue to create the requisite infrastructure. I have noted the following of text properties in my investigation, when undo corrupts it can either partially corrupt it - in which case we can still pretend nothing has gone wrong -- as long as beg and end markers are available -- #'org-transclusion-remove will work.

I dont feel there is something in your code that creates this problem -- its emacs undo bugging out in very specific alignment of circumstances, I have checked with viewundo and problem doesn't occur with even this minimal setting but with undo-tree.el package.

* view-undo outputs :: i = insert; p = text property -- etc see package for more legends.
'/' implies break in undo tree -- undo should be inverse of this
++ #'test-payload
(/dmdpppppppppppppppppppppppppppppppppppppppppppppppppppppppi?) 

++ text-props during save as it passes through `#'org-transclusion-remove`
Variables:
beg: 1
end: 17
keyword-plist: (:link [[file:source.org]] :current-indentation 0)
indent: 0
keyword: #+transclude: [[file:source.org]]
tc-pair-ov: #<overlay from 1 to 17 in source.org>

(/dmdmmmmpppppppppppppppppppppppppppppppppppppppppppppppppppppppi?

 #'save-buffer **(undo tree is flipped wrongly)** (?) Older entries should stay below
 id/ddpppppppppppppppppppppppppppppppppppppppppppppppppppppppi?)

Undo

**text properties is corrupted**
(/idmmmmm?dmmmmpppppppppppppppppppppppppppppppppppppppppppppppppppppppipppppppppppi?

 /dmdmmmmpppppppppppppppppppppppppppppppppppppppppppppppppppppppi?
 **undo tree should work like this -- preserve succession**
 id/ddpppppppppppppppppppppppppppppppppppppppppppppppppppppppi?)

++ Text props as it passes through `#'org-transclusion-remove` which it cannot process
Variables:
beg: 1
end: 1
keyword-plist: (:link [[file:source.org]] :current-indentation 0)
indent: 0
keyword: #+transclude: [[file:source.org]]
tc-pair-ov: #<overlay in no buffer>
progn: Recursion Warning! Text Properties corrupted!

front-sticky t line-prefix [Show] local-map [Show] org-transclusion-beg-mkr #<marker at 1 in target.org> org-transclusion-end-mkr #<marker at 17 in target.org> org-transclusion-orig-keyword (:link "[[file:source.org]]" :current-indentation 0) org-transclusion-pair #<overlay from 1 to 17 in source.org> org-transclusion-type "org-link" read-only t rear-nonsticky t wrap-prefix [Show]


* **Fully corrupted: beg = end + tc-pair = null**

Text content at position 1:

There are text properties here: front-sticky t line-prefix [Show] local-map [Show] org-transclusion-beg-mkr #<marker at 1 in target.org> org-transclusion-end-mkr #<marker at 1 in target.org> org-transclusion-orig-keyword (:link "[[file:source.org]]" :current-indentation 0) org-transclusion-pair # org-transclusion-type "org-link" read-only t rear-nonsticky t wrap-prefix [Show]


* **Semi Corrupted: beg & end data retained, but tc-pair = null**

Text content at position 1:

There are text properties here: front-sticky t line-prefix [Show] local-map [Show] org-transclusion-beg-mkr #<marker at 1 in target.org> org-transclusion-end-mkr #<marker at 17 in target.org> org-transclusion-orig-keyword (:link "[[file:source.org]]" :current-indentation 0) org-transclusion-pair # org-transclusion-type "org-link" read-only t rear-nonsticky t wrap-prefix [Show]

In all three cases orig-keyword (link) is preserved

Possible solution

1. Detect corruption by marker beg = end
2. Recover by regenerating the text property from link and set-text-property 

Recovery needs to be here

(defun org-transclusion-remove () "Remove transcluded text at point. When success, return the beginning point of the keyword re-inserted." (interactive) (if-let* ((beg (marker-position (get-char-property (point) 'org-transclusion-beg-mkr))) (end (marker-position (get-char-property (point) 'org-transclusion-end-mkr))) (keyword-plist (get-char-property (point) 'org-transclusion-orig-keyword)) (indent (plist-get keyword-plist :current-indentation)) (keyword (org-transclusion-keyword-plist-to-string keyword-plist)) (tc-pair-ov (get-char-property (point) 'org-transclusion-pair)))

  (progn
(when (eq beg end)
  (progn
    (message "Recursion Warning! Text Properties corrupted!")
    ;; Restore beg and end text property before processing
    (org-transclusion-restore-tp keyword)))

(let ((mkr-at-beg

   ;; --------------

   ))))))


We need to create a function #'org-transclusion-restore-tp that can do this. But this is a skill issue for me.
akashpal-21 commented 3 months ago

My rationale turned out to be wrong -- I can replicate this problem easily it isn't a rare bug but consistent. It seems the undo tree is storing the changes in two steps during save, I will do more testing to figure out a solution if possible.

akashpal-21 commented 3 months ago

I tested multiple solutions playing around with the undo-buffer-list -- there is a new macro in emacs (with-undo-amalgamate) but none of them seems to stop the issue - I think the correct solution would be to stop the before save and after save hooks from modifying the list itself -- users should be put back to just how it was before a save --

I settled for restoring the values after the hooks -- which seems to have promising result -- the undo-buffer-list isn't hammered by entries everytime a save-buffer is initiated.

Please comment on it when you get some time

(defun org-transclusion-before-save-buffer ()
  "Remove transclusions in `before-save-hook'.
This function is meant to clear the file clear of the
transclusions.  It also remembers the current point for
`org-transclusion-after-save-buffer' to move it back."
  (let ((undo-list buffer-undo-list))
    (setq org-transclusion-remember-point (point))
    (setq org-transclusion-remember-transclusions
          (org-transclusion-remove-all))
    (setq buffer-undo-list undo-list)))

(defun org-transclusion-after-save-buffer ()
  "Add transclusions back as they were `before-save-buffer'.
This function relies on `org-transclusion-remember-transclusions'
set in `before-save-hook'.  It also move the point back to
`org-transclusion-remember-point'."
  (unwind-protect
      (progn
        ;; Assume the list is in descending order.
        ;; pop and do from the bottom of buffer
        (let ((do-length (length org-transclusion-remember-transclusions))
              (do-count 0)
          (undo-list buffer-undo-list))
          (dolist (p org-transclusion-remember-transclusions)
            (save-excursion
              (goto-char p)
              (org-transclusion-add)
              (move-marker p nil)
              (setq do-count (1+ do-count))
              (when (> do-count do-length)
                (error "org-transclusion: Aborting. You may be in an infinite loop"))))
          ;; After save and adding all transclusions, the modified flag should be
          ;; set to nil
          (restore-buffer-modified-p nil)
          (when org-transclusion-remember-point
            (goto-char org-transclusion-remember-point))
      ;; Restore `buffer-undo-list'
      (setq buffer-undo-list undo-list)))
    (progn
      (setq org-transclusion-remember-point nil)
      (setq org-transclusion-remember-transclusions nil))))

Preliminary tests are fine.

akashpal-21 commented 3 months ago

I think the underlying problem is that emacs is splitting the large entry to the undo-buffer-list during its normal garbage collection process -- the list is hammered with entries during a save with the hooks in general.

nobiot commented 3 months ago

Thank you so much for spending the time with this thorough investigation. At a glance, your general direction makes sense to me. Let me also spend some time on my end, too. Thank you again! I could have gotten this far without you @akashpal-21 and the reproduction recipe by @devcarbon-com -- I know it's not settled completely yet but I feel the end of the long tunnel is close 😀

josephmturner commented 3 months ago

Would the with-silent-modifications macro be of use here?

akashpal-21 commented 3 months ago

Would the with-silent-modifications macro be of use here?

Yes it works just as fine in my test. Although the documentation says not to use it if buffer-content gets changed -- which save hooks does but it restores to the same place as it was, so in theory it should work just as fine.

nobiot commented 3 months ago

One thing I would also like to try is to use write-contents-functions. With it, we may be able to avoid before-save-hook and after-save-hook combo, which seems to be a big part of this whole set of issues.

More concretely, this idea looks like this:

(defun org-transclusion-save-buffer ()
  (with-silent-modifications
    (org-transclusion-before-save-buffer)
    (write-file buffer-file-name)
    (org-transclusion-after-save-buffer))
  t)

(add-hook 'write-contents-functions #'org-transclusion-save-buffer nil :local)

@akashpal-21, would you mind sharing a way to test this idea, to see if it does anything positive to resolve the issue of infinite loop? I'd like to quickly see if this idea is a viable avenue to explore...

akashpal-21 commented 3 months ago

@nobiot We cannot use `write-file' inside the function else this will result in its own infinite loop that emacs will halt warning

Saving file /home/akash/Desktop/test/target.org... [227 times]
while: Lisp nesting exceeds ‘max-lisp-eval-depth’: 1601

This is working

(defun org-transclusion-activate ()
  "Activate Org-transclusion hooks and other setups in the current buffer.
This function does not add transclusions; it merely sets up hooks
and variables."
  (interactive)
  (add-hook 'write-contents-functions #'org-transclusion-save-buffer)
  (add-hook 'kill-buffer-hook #'org-transclusion-before-kill nil t)
  (add-hook 'kill-emacs-hook #'org-transclusion-before-kill nil t)
  (add-hook (if (version< org-version "9.6")
                'org-export-before-processing-hook
              'org-export-before-processing-functions)
            #'org-transclusion-inhibit-read-only nil t)
  (org-transclusion-yank-excluded-properties-set)
  (org-transclusion-load-extensions-maybe))

(defun org-transclusion-deactivate ()
  "Deactivate Org-transclusion hooks and other setups in the current buffer.
This function also removes all the transclusions in the current buffer."
  (interactive)
  (org-transclusion-remove-all)
  (remove-hook 'write-contents-functions #'org-transclusion-save-buffer)
  (remove-hook 'kill-buffer-hook #'org-transclusion-before-kill t)
  (remove-hook 'kill-emacs-hook #'org-transclusion-before-kill t)
  (remove-hook (if (version< org-version "9.6")
                   'org-export-before-processing-hook
                 'org-export-before-processing-functions)
               #'org-transclusion-inhibit-read-only t)
  (org-transclusion-yank-excluded-properties-remove))

(defun org-transclusion-save-buffer ()
  (with-silent-modifications
    (org-transclusion-before-save-buffer)
    (write-region (point-min) (point-max) buffer-file-name)
    ;;(write-file buffer-file-name)      ;; do not use write-file
    (org-transclusion-after-save-buffer))
  t)

For seeing whether it is working we need to see changes to the buffer-undo-list which this doesn't make. So we can conclude(?) its working as intended.

nobiot commented 3 months ago

So you use write-region instead of write-file, and this is the only difference from my idea?

For seeing whether it is working we need to see changes to the buffer-undo-list which this doesn't make. So we can conclude(?) its working as intended.

I think so :). Thank you!

I feel this is a simpler implementation and improvement over the original with before/after-save-hook, and fixes the infinite loop issue as we know it. Do you agree, @akashpal-21, @josephmturner, all?

I can push this version of the fix and leave it there... I'm terms of the timing, @josephmturne, I think I should leave this fix out from the v1.4 release so that org-transclusion does not block you from your side of the work (I will look at #242 before v1.4 to include it, soon).

nobiot commented 3 months ago

@akashpal-21 Would you mind sending a patch or PR with your name on? I would love your name recorded to the GitHub repo and git commit log as the contributor to this part if you do not mind.... For this, I'd need to ask you to do a FSG copyright assignment paperwork - it's quick to start if you agree.

Contributing part of README has a link to a guide.

It has been great help and thank you.

akashpal-21 commented 3 months ago

@nobiot sure, let me fill up their form -- I will also test it for a few days to be absolutely sure it doesn't result in other complications and then make the final PR. Will be done by the following week.

Thanks for your final advice - the solution wouldn't had been elegant without it.

Best and thank you, Akash.

nobiot commented 3 months ago

Thank you.

In addition, I think this (add-hook 'write-contents-functions #'org-transclusion-save-buffer) and removal should be local, like the other add-hooks.

nobiot commented 3 months ago

@akashpal-21

I have done a quick test and also a bit more investigation.

In this info node (you can evaluate it in Emacs) (info "(elisp) Writing to Files"), you have this paragraph (my emphasis)

You can write the contents of a buffer, or part of a buffer, directly to a file on disk using the ‘append-to-file’ and ‘write-region’ functions. Don’t use these functions to write to files that are being visited; that could cause confusion in the mechanisms for visiting.

As an alternative, this below might work. The difference is this:

  1. Use write-file (avoid write-region per the documentation above)
  2. Use write-file-hooks (you can inspect how this behaves in basic-save-buffer) [Edit]: local-write-file-hooks is obsolete use write-file-hooks set ocally

This seems to avoid the loop in save-buffer you observe, but it does not seem to correctly remove transclusion in the saved file.

I need to run and come back later. Just a quick report of what I am finding.

[Edit]: I tested again. The code below seems to work on my end.

(add-hook 'write-file-functions #'org-transclusion-save-buffer nil :local)

(defun org-transclusion-save-buffer ()
  ;; Remove the hook to avoid `write-file' to infinately call this function
  (remove-hook 'write-file-functions #'org-transclusion-save-buffer :local)
  (with-silent-modifications
    (org-transclusion-before-save-buffer)
    (write-file buffer-file-name)
    (org-transclusion-after-save-buffer))
  ;; Add the hook again after the buffer is saved to file.
  (add-hook 'write-file-functions #'org-transclusion-save-buffer nil :local)
  ;; Return t to avoid the normal save behavior to be processed in
  ;; `save-buffer'. See docstring of `write-file-functions'.
  t)
nobiot commented 3 months ago

Tested positively (no infinite loop) in the following cases:

  1. Repro procedure by @devcarbon-com
  2. Repro procedure by @akashpal-21

1. Repro procedure by @devcarbon-com

I went back to the repo procedure by @devcarbon-com at https://github.com/nobiot/org-transclusion/issues/177#issuecomment-1880265546.

2. Repro procedure by @akashpal-21

https://github.com/nobiot/org-transclusion/issues/177#issuecomment-2070272246

I haven't been able to reproduce the infinite loop with this recipe, so my test is not valid, but just to report that the change above with write-file-functions do not cause infinite loops.

Moving forward

As indicated (https://github.com/nobiot/org-transclusion/issues/177#issuecomment-2104699928):

  1. I will wait for @akashpal-21 for a PR for this infinite loop issue (thank you). I will NOT consider this PR to be part of v1.4 -- I want to leave it in the ELPA-devel for any regression, etc. Hopefully it will be part of v1.5, or 1.4.x as a bugfix update.
  2. I will appreciate if others could also test the two repro procedures above...
  3. Separately, I will look at https://github.com/nobiot/org-transclusion/pull/242 before v1.4 to include it, soon.
josephmturner commented 3 months ago

@nobiot I tried a week ago and again this evening to reproduce the issue according to @devcarbon-com's instructions, but I never got the infinite loop. Instead, the file is saved as expected. I'm on GNU Emacs 29.3 (build 1, x86_64-pc-linux-gnu, GTK+ Version 3.24.41, cairo version 1.18.0). Is it possible to reproduce the error in a more automated way? Please try adjusting the following snippet until the error is reproduced consistently in emacs -Q:

(let ((code-file (make-temp-file "org-transclusion-test-code" nil ".el"))
      (org-file (make-temp-file "org-transclusion-test-org" nil ".org")))
  ;; *Change to location on your machine where org-transclusion is installed.*
  (add-to-list 'load-path "~/.local/src/org-transclusion/")
  (load-library "org-transclusion")

  (with-temp-file code-file
    (insert "(defun bar ()\n")
    (insert "  (interactive)\n")
    (insert "  (message \"hello\"))"))

  (with-current-buffer (find-file org-file)
    (org-transclusion-mode +1)
    (insert "* OT test\n")
    ;; Colon after #+transclude intentionally omitted
    (insert
     (format "#+transclude [[./%s::bar][bar]] :thing-at-point defun  :src elisp"
             (file-relative-name code-file)))
    (with-demoted-errors (org-transclusion-add))
    (search-backward "transclude")
    (forward-word)
    (insert ":")
    (org-transclusion-add)
    (save-buffer)
    (undo)
    (save-buffer)))

The above snippet is my attempt to automate @devcarbon-com's instructions. It does not reproduce the issue on my machine.

Regardless, this is an excerpt from the docstring for write-file-functions:

These hooks are considered to pertain to the visited file.
So any buffer-local binding of this variable is discarded if you change
the visited file name with M-x set-visited-file-name, but not when you
change the major mode.

So you may want to consider setting after-set-visited-file-name-hook if you want org-transclusion-save-buffer to stick around when the buffer's file name is changed.

akashpal-21 commented 3 months ago

@nobiot

I was using this

(defun org-transclusion-save-buffer ()
  "Save buffer protocols. Ensures file on disk is cleaned of transclusions;
Before writing to disk run `org-transclusion-before-save-buffer' which removes
active transclusions and generates a list of transclusions that were removed;
after writing to disk is complete, re-enable transclusions as and how they were
by running `org-transclusion-after-save-buffer' over the previously generated
list.

Run within `with-silent-modifications' so that none of this is recorded in the
`buffer-undo-list' of the buffer concerned."
  (with-silent-modifications
    (save-restriction
      (widen)
      (org-transclusion-before-save-buffer)
      (write-region (point-min) (point-max) buffer-file-name nil t)
      (org-transclusion-after-save-buffer)))
  t)

With write-region nil t that is visit set to `t' I can see no problem with this implementation as is from my side.

akashpal-21 commented 3 months ago

@nobiot @josephmturner

can we get a report on the values of the text-props as it passes through org-transclusion-remove such as

++ text-props during save as it passes through `#'org-transclusion-remove`
Variables:
beg: 1
end: 17
keyword-plist: (:link [[file:source.org]] :current-indentation 0)
indent: 0
keyword: #+transclude: [[file:source.org]]
tc-pair-ov: #<overlay from 1 to 17 in source.org>

etc?

The infinite recursion happens when beg and end goes missing.

What I am trying to say is that the root cause of the infinite recursion is that text-properties of the overlay may be get corrupted for x,y,z number of reasons and if the beg and end of a text prop is corrupted then the org-transclusion-remove-all falls in problem -- so our real solution should be some sort of self correction mechanism for the remove function.

akashpal-21 commented 3 months ago

This solution will not take care of the infinite recursion once and for all -- but this improvement is warranted nevertheless since users might want to save their buffer-undo-lists and the save-hooks should definitely not touch them.

But for the infinite recursion - if and how it exits - it has to be solved in the org-transclusion-remove function by making it fault tolerant - and detect corruptions when beg and end are equal.

akashpal-21 commented 3 months ago

@all

One final hypothesis I wish to present for the enduring problem that is faced by all of us trying to replicate the issue - some of us can replicate consistently on a certain recipe while some of cannot - it is because we are fundamentally dealing with a non-deterministic system. EMACS GARBAGE COLLECTION - for more certainty we need more data and collaboration that cannot be done over the internet,

It may be those of us that fall into this problem through undo list corruption have a certain characteristic to our machines - maybe in the domain of available memory or what not? This is my instinct -- youd have to determine whether this is a pure conspiracy theory because elisp is by design deterministic or there is some truth to the claim.

@nobiot We should defer the PR to a completely different discussion - that is general improvement to the save protocol and distinguish it from this problem altogether.

For the problem in my opinion has to be solved permanently in the org-transclusion-remove function and not anywhere else -- but this should be above and all superflous - an insurance mechanism -- a fault tolerance protocol.

nobiot commented 3 months ago

@nobiot We should defer the PR to a completely different discussion - that is general improvement to the save protocol and distinguish it from this problem altogether.

Sounds reasonable to me. Agree.

I just re-compiled Emacs 29.1.90; now I cannot reproduce with devcarbon-com's procedure. So I agree it's not deterministic... or there is something we don't know yet.

For the problem in my opinion has to be solved permanently in the org-transclusion-remove function and not anywhere else -- but this should be above and all superflous - an insurance mechanism -- a fault tolerance protocol.

In testing 29.1.90 this time, beg and end passed to remove look fine.

nobiot commented 3 months ago

@akashpal-21

My latest understanding is that:

  1. buffer-undo-list is NOT a direct cause of the infinite recursion.
  2. The direct cause is org-transclusion-remove (more specifically the beg and end variables in it).
  3. The "corruption" of buffer-undo-list may cause 2., but we do not know exactly how this happens (you suspect it may be a combination of buffer-undo-list and GC in someway).

Is this correct?

This understanding is from my reading of how you phrase #245 and this from your previous comment:

@nobiot We should defer the PR to a completely different discussion - that is general improvement to the save protocol and distinguish it from this problem altogether.

For the problem in my opinion has to be solved permanently in the org-transclusion-remove function and not anywhere else -- but this should be above and all superflous - an insurance mechanism -- a fault tolerance protocol.

Thank you.

akashpal-21 commented 3 months ago

@nobiot Yes that is my understanding - buffer-undo-list corruption is simply one way in which text-properties regarding BEG and END may get corrupted and be set as equal -- in which case the org-transclusion-remove falls in a problem.

But there may be N number of ways in which the text-property may get corrupted - and for a permanent fix - we should make org-transclusion-remove fault tolerant -- detect explicitly when BEG and END have been made equal by whatever process and refuse to process as is--

Maybe even try to recreate the BEG and END from the keyword-plist since the overlay still exists it just points wrong --

However that property may get corrupted.