nobiot / org-transclusion

Emacs package to enable transclusion with Org Mode
https://nobiot.github.io/org-transclusion/
GNU General Public License v3.0
918 stars 45 forks source link

Infinite loop when saving org file #177

Open pp5x opened 1 year ago

pp5x commented 1 year ago

What?

It seems org-transclusion has a bug when saving file and rendering. When the file is saved for the first time on a freshly launched emacs : no issue.

Then when I try to save the file a second (without modification) : emacs goes in an infinite loop and becomes unresponsive.

I managed to make it stop by using pkill --signal USR2 emacs and got a backtrace of what it was doing (see screenshot). The backtrace seems to show an interaction with org-element / org-transclusion was writing an infinite amount of time #+transclude: (see the number of lines. My file is originally 200 lines long). Looks like it's related to the way org-transclusion is saving files (that was mentioned in #109)

I suspect the bug is a bad interaction with org-element--cache-active-p which grows very very quickly. I noticed that running org-element-cache-reset can help when the file is opened again. Emacs also becomes unresponsive when i just move the cursor around the #+transclude: lines... :boom:

Screenshot from 2023-03-24 01-12-26

Doom Emacs

I am running doom-emacs:

generated  mars 24, 2023 01:26:09
system     NixOS 22.11.3299.9ef6e7727f4 (Raccoon) Linux 5.15.103 x86_64
emacs      28.2 ~/.dotfiles/emacs/.emacs.d/
doom       3.0.0-pre PROFILE=_@0 HEAD, master 4e105a95a 2023-03-22 18:29:38 -0400 ~/.doom.d/
shell      /run/current-system/sw/bin/bash
features   CAIRO DBUS FREETYPE GIF GLIB GMP GNUTLS GSETTINGS HARFBUZZ JPEG JSON LIBOTF
      LIBSELINUX LIBSYSTEMD LIBXML2 M17N_FLT MODULES NATIVE_COMP NOTIFY INOTIFY PDUMPER
      PNG RSVG SECCOMP SOUND THREADS TIFF TOOLKIT_SCROLL_BARS X11 XAW3D XDBE XIM XPM
      LUCID ZLIB
traits     batch server-running custom-file
modules    :config use-package :completion company vertico :ui doom doom-dashboard doom-quit
      (emoji +unicode) hl-todo modeline ophints (popup +defaults) treemacs vc-gutter
      vi-tilde-fringe workspaces zen :editor (evil +everywhere) file-templates fold
      (format +onsave) snippets :emacs dired electric undo vc :term vterm :checkers
      syntax (spell +flyspell) grammar :tools direnv editorconfig (eval +overlay) lookup
      lsp magit make :lang (cc +lsp) emacs-lisp markdown nix (org +journal +pretty) rst
      sh yaml :config (default +bindings +smartparens)
packages   (org-auto-tangle) (gift-mode :recipe (:host github :repo csrhodes/gift-mode :files
      (gift-mode.el))) (org-transclusion :recipe (:host github :repo
      nobiot/org-transclusion :branch main :files (*.el)))
elpa       vterm

Is there any chance we could solve this issue? I would love to be able to use this package to design my programming courses! Thanks!

josephmturner commented 4 months ago

I tried reproducing the issue by let-binding gc-cons-threshold to a low value (99) around the snippet I shared above. I also tried inserting some explicit calls to garbage-collect in the snippet. Still no infinite loop.

can we get a report on the values of the text-props as it passes through org-transclusion-remove such as...

@akashpal-21 I was not ever able to reproduce the issue. If it would still be helpful for me to report on these values, would you please send an Elisp snippet for me to run in emacs -Q which returns the relevant values?

akashpal-21 commented 4 months ago

@josephmturner Hmm that's good to hear - I didn't know how to emulate GC circumstances.

To get the values we can debug the org-transclusion-remove function or just use describe-text-properties on the overlay.

For example I cannot reproduce the bug myself - but I can report on partial corruption of one property -- that I talked about earlier that does not result in the infinite loop but causes corruption still. For infinite loop the beg and end of the overlay should equate - this causes the org-transclusion-remove to fall in problem.

Please allow me a minute to attach a screen record to show the partial corruption still - I cannot reproduce the full corruption myself and therefore get the infinite loop -

akashpal-21 commented 4 months ago

@josephmturner Just when I told you I cannot replicate it - I fell into it - now I forgot how to exit when such a situation arises I cannot quit emacs - it wont let me quit in any way

Screenshot_2024-05-12_02-34-49

Allow me a minute to recover.

akashpal-21 commented 4 months ago

https://github.com/nobiot/org-transclusion/assets/46517170/bda275c5-4c45-49f3-9e31-14e45bc0b818

Partial corruption case as noted earlier - I cannot now replicate the infinite loop since beg and end wont coincide - but it did then -- now I dont even know what to say ?

akashpal-21 commented 4 months ago

Ok recreated it again !! Lmao - Please see the video

https://github.com/nobiot/org-transclusion/assets/46517170/c575757a-80f0-4d6a-8762-cccf84c3b5b3

josephmturner commented 4 months ago

Thank you @akashpal-21! I can now reproduce the issue.

I was able to stop the hung Emacs with kill -SIGUSR2 <PID>, which then allowed me to examine the backtrace:

Debugger entered--Lisp error: (quit)
  insert-before-markers("#+transclude: [[./org-transclusion-test-codebYgV7g...")
  org-transclusion-remove()
  org-transclusion-remove-all()
  org-transclusion-before-save-buffer()
  run-hooks(before-save-hook)
  basic-save-buffer(t)
  save-buffer(1)
  funcall-interactively(save-buffer 1)
  command-execute(save-buffer)

Furthermore, I was able to go back to the transclusion buffer, put point on the transclusion and verify that the markers are in the same location:

(equal
  (get-text-property (point) 'org-transclusion-beg-mkr)
  (get-text-property (point) 'org-transclusion-end-mkr))
  ;;   => t
josephmturner commented 4 months ago

Here's a recipe for reproducing the infinite loop on the current main branch (b23ead2) with less manual action.

In emacs -Q, evaluate the following form:

(let ((source-file (make-temp-file "org-transclusion-test-source"))
      (org-file (make-temp-file "org-transclusion-test-org" nil ".org")))
  ;; *Change to location on your machine where org-transclusion is installed.*
  (add-to-list 'load-path "~/.local/src/org-transclusion/")
  (load-library "org-transclusion")

  (with-temp-file source-file (insert "foobar"))
  (with-current-buffer (find-file org-file)
    (insert (format "#+transclude: [[./%s]]" (file-relative-name source-file)))
    (org-transclusion-add)))

Then in the org-mode buffer which appears containing the transcluded word "foobar", repeat the following steps until Emacs hangs:

  1. save the buffer with C-x C-s
  2. undo the buffer history as far as it will go with C-/ C-/ C-/ C-/ C-/ (until No further undo information error)
  3. save the buffer with C-x C-s
  4. redo the buffer history as far as it will go with C-? C-? C-? C-? C-? (until No undone changes to redo error)

On my machine, this reliably reproduces the infinite loop after repeating these steps a few times.

For some reason I don't yet understand, I wasn't able to reproduce the loop with non-interactive calls to save-buffer, undo, and undo-redo (even using call-interactively), so these instructions still involve some manual interaction.

josephmturner commented 4 months ago

I thought not modifying the buffer-undo-list when adding/removing transclusions would solve the problem, but I was wrong. Please ignore the accidental commit above which claims to resolve this issue.

nobiot commented 4 months ago

I have tested different Emacs versions from compiling it from source: 29.1.90, 29.1, 29.2, 29.3; I have not been able to reliably reproduce the infinite loop with @devcarbon-com's procedure -- I used to be able to, but no longer.

I have managed to make it happen a couple of times (I cannot record the exact steps) with 29.1 and 29.3.

Based on @akashpal-21's analysis, I have pushed a new branch and commit to force the infinite loop to occur -- I still do not know exactly when we will arrive at this condition in real use of transclusions, but I think the branch can be used as a test harness to craft a preventive measure and test it.

I also modified @josephmturner's automation code as below to be able to reproduce the infinite loop easily with a command.

Instruction:

  1. (prep) Disable auto-save-visited-mode or auto-save-mode (better to have a full control of buffer-save)
  2. (prep) Checkout the repro-infinite branch
  3. (prep) Adjust the location of ~/src/org-transclusion/ in the command below
  4. (prep) Evaluate test/infinite-loop command
  5. Call test/infinite-loop command
  6. --- STOP HERE TO UNDERSTAND HOW TO UNDO THE INFINITE LOOP :)
  7. Manually save the buffer
  8. (alternative to step 6). Manually remove the transclusion
  9. (After the test) Once test is done, check out main branch and evaluate org-transclusion-remove to reset the test harness...

Infinite loop starts as soon as you save the buffer. Stop it immediately with C-g -- even with my old machine, I get about 2000 lines added within a fraction of a second. You need to delete the transclusion completely in order for you to properly kill the buffer (because of kill-buffer-hook).

(defun test/infinite-loop ()
  (interactive)
  (let ((code-file (make-temp-file "org-transclusion-test-code" nil ".el"))
        (org-file (make-temp-file "org-transclusion-test-org" nil ".org")))
    ;; *Change to location on your machine where org-transclusion is installed.*
    (add-to-list 'load-path "~/src/org-transclusion/")
    (load-library "org-transclusion")

    (with-temp-file code-file
      (insert "(defun bar ()\n")
      (insert "  (interactive)\n")
      (insert "  (message \"hello\"))"))

    (with-current-buffer (find-file org-file)
      ;; Inhibit read-only so that we can easily remove the problem for
      ;; testing purposes.
      (setq-local inhibit-read-only t)
      (org-transclusion-mode +1)
      (insert "* OT test\n")
      ;; Colon after #+transclude intentionally omitted
      (insert
       (format "#+transclude: [[./%s::bar][bar]] :thing-at-point defun  :src elisp"
               (file-relative-name code-file)))
      (org-transclusion-add))))
nobiot commented 4 months ago

@josephmturner, I posted the comment above before noticing your latest comment about recipe for repro. I will come back to it later. Thank you.

akashpal-21 commented 4 months ago

I still do not know exactly when we will arrive at this condition in real use of transclusions,

I also think so - particularly because org-transclusion can never generate this result -- it is impossible for a null file to be transcluded

Nobody should under any normal circumstances trip upon this bug, possibly triggered by users when testing the program's functionality. Requires very rare alignment of known circumstances.

The problem is inherited from the environment it is functioning in - not under normal operations but exceedingly rare alignment of circumstances, the problem really isn't that the problem exists - but that when - in a 1:100000 circumstance it is reached - it results in a catastrophe. The user is imprisoned until they figure out a way to exit. Should the remove protocol refuse to entertain the impossible case of getting a 0 length overlay - it doesn't even need to try to rectify errors - but it should give the user the choice to manually delete the overlays and be allowed to save and quit.

nobiot commented 4 months ago

My first attempt at a preventive measure.

The reproduce recipe from @josephmturner works on my end but only intermittently. So I used 7ad7936 to force infinite loop and to test the preventive measure. It seems to work with multiple transclusions.

akashpal-21 commented 4 months ago

@nobiot It is always a pleasure to see you come up with solutions, I introduced the change to the org-transclusion-content-insert as you suggested to force beg and end to coincide and checked the same with describe-text-properties - the remove works flawlessly. Thank you so much. It is working on my end.

josephmturner commented 4 months ago

@nobiot Your patch-infinite-loop workaround successfully avoids the infinite loop on my machine. Would it be appropriate to say something like

[org-transclusion] Recovering from error: incorrect values for transclusion BEG and END

We could use warn instead of message? This could be an opportunity to ask the user to report the issue, if we have an idea of more data the user could share with us to help figure out a true solution.

Thank you!

nobiot commented 4 months ago

I have managed to reproduce the infinite loop 100% of times on my end -- code in the "details" toggle below.

The theory of what happens is this, and the the code supports the theory.

[Fact / design of org-transclusion]

[Now what happens]

Well, my description using a human language may not be precise and accurate, but based on this theory, I have come up with the code. Now I can reproduce the infinite loop condition 100% of my attempts.

If this theory is confirmed, I think I need to work to get the current "workaround" to be the way forward, removing the use of makers in the current way.

Let me know how you go...

To use the code provided below, adjust the src location, evaluate the snippet, and call the command test/combine. You should see this, along the temporary file as @josephmturner's code does. See the both beg and end markers point to the same location.

image

```emacs-lisp (defun test/repro-loop () (interactive) (let ((source-file (make-temp-file "org-transclusion-test-source")) (org-file (make-temp-file "org-transclusion-test-org" nil ".org"))) ;; *Change to location on your machine where org-transclusion is installed.* (add-to-list 'load-path "~/src/org-transclusion/") (load-library "org-transclusion") (with-temp-file source-file (insert "foobar")) (pop-to-buffer (find-file org-file)) (insert (format "#+transclude: [[./%s]]" (file-relative-name source-file))) (org-transclusion-add) (save-buffer))) (defun test/force-gc () (interactive) (undo) ;; This save-buffer is the key. If you comment it out, the infinite loop won't ;; happen. (save-buffer) ;; Garbage collect is also the key (garbage-collect) (undo-redo) (describe-text-properties 1)) (defun test/combine () (interactive) (test/repro-loop) ;; `undo-boundary' is necessary to get undo to work through calling the test ;; functions in Elisp. (undo-boundary) (test/force-gc)) ```
akashpal-21 commented 4 months ago

My opps at silicon valley have hidden the comment but feels good to know my intuition was correct

https://github.com/nobiot/org-transclusion/issues/177#issuecomment-2071345011 https://github.com/nobiot/org-transclusion/issues/177#issuecomment-2105681231

We finally have a confirmed fix to this issue. Both by simulating the conditions in a deterministic manner and solving it using a more general solution -- rather than specific.

nobiot commented 4 months ago

Can you reproduce the problem on your end with the code, too?

akashpal-21 commented 4 months ago

Can you reproduce the problem on your end with the code, too?

Positively yes.

Screenshot_2024-05-14_03-11-51

josephmturner commented 4 months ago

Can you reproduce the problem on your end with the code, too?

Me too.

[Now what happens]

* In some combination of `undo` and `buffer-save` with transclusions, the markers can temporarily point to non-existing locations in the buffer.

* If the garbage collection happens to run at this moment, it will sweep these pointers. Now they end up pointing to start of the buffer (point 1).

This is very clear. Thank you, @nobiot!

IIUC, org-transclusion-beg-mkr and org-transclusion-end-mkr always point to the current buffer (the buffer containing the text they are applied to). Instead of markers, we could store position numbers in org-transclusion-remember-transclusions by converting each marker to a number in org-transclusion-before-save-buffer. Then, in org-transclusion-after-save-buffer, we'd make a new marker for each number. This may avoid garbage collection at the cost of unnecessary memory usage, code complexity, and potential bugs (what happens if a different before-save-hook or after-save-hook inserts some text and now each number is wrong?). It's a hack -- I'm just thinking "out loud" :)

It would be nice if we could mark certain markers so that GC doesn't collect them. Like specifying "weakness" in make-hash-table.

nobiot commented 3 months ago

Thank you all -- just letting you know that it looks like I won't have much capacity until mid-late June -- I will try to come back earlier to this work here, but I cannot promise. I wanted to let you know before I "disappear". I will try not to go completely radio silent but I may. Talk to you soon =)