Why overlays and not text properties

wyuenho / all-the-icons-dired

Adds dired support to all-the-icons

GNU General Public License v3.0

23 stars 7 forks source link

Why overlays and not text properties #16

Closed aikrahguzar closed 2 years ago

aikrahguzar commented 2 years ago

Hi, I have been using this package through doom for a while now and been pretty happy with it but recently on opening a directory with some 15000 files caused dired to choke and using profiling suggested that a lot of the time was being spent here. On investigation it seems a large part of the slowdown is due to the use of overlays. I think the reason is twofold:

1) Large number of overlays is problematic. 2) Overlays need to be recreated whenever files are reordered, filtered, added to dired buffer. I was using dired-filter-mode and dired-filter-group-mode from dired-hacks and I needed to advise them too to refresh the icons and that was making the problem worse. But even with those modes off, it was really slow and I had to use C-g to abort the operation.

I tried adding icons using a display text-property instead and it seems to work much better. It still takes a few second to open that directory but not too bad (the best thing here would be lazy propertizing like font-lock does but I don't know how that works). There seem to be several other advantages beside performance, since now lines can also be added, removed and reordered freely without needing a refresh,

1) Very few functions need to advised. I have only advised dired-insert-set-properties and added a hook to dired-after-readin-hook and everything seems to be working smoothly.

2) When using dired-insert-subtree only the new entries need to be propertized so it is fast even in very large directories. find-dired also seems to behave better for the same reason.

In my very limited experience with it everything seems to be fine and I seemno obvious downsides. So I wanted to ask if there are pitfalls to this approach?

protesilaos commented 2 years ago

While waiting for the maintainer, just to note that org-modern relies on text properties to get the job done. Seems to work just fine: https://github.com/minad/org-modern

@aikrahguzar Do text properties fix issue 13? Perhaps you have your tweaks in some repo I could access? Just to test things.

aikrahguzar commented 2 years ago

While waiting for the maintainer, just to note that org-modern relies on text properties to get the job done. Seems to work just fine: https://github.com/minad/org-modern

@aikrahguzar Do text properties fix issue 13? Perhaps you have your tweaks in some repo I could access? Just to test things.

Hi @protesilaos, they do fix that issue.

I actually don't have a repo. I just copied the file and started messing with it without any version control. I can fork this repo and then put the changes in there but here is basically all the code except the defcustoms, which are about the only lines that remain intact form the original code.

;;;###autoload
(define-minor-mode all-the-icons-dired-mode
  "Display all-the-icons icon for each file in a Dired buffer."
  :lighter " all-the-icons-dired-mode"
  :global t
    (if all-the-icons-dired-mode
        (all-the-icons-dired--setup)
      (all-the-icons-dired--teardown)))

(defun all-the-icons-dired--put-icon (pos) "Propertize POS with icon."
       (let* ((file (dired-get-filename 'relative 'noerror))
              (icon (if (file-directory-p file)
                        (all-the-icons-icon-for-dir file
                                                    :face 'all-the-icons-dired-dir-face
                                                    :v-adjust all-the-icons-dired-v-adjust)
                      (apply 'all-the-icons-icon-for-file file
                             (append
                              `(:v-adjust ,all-the-icons-dired-v-adjust)
                              (when all-the-icons-dired-monochrome
                                `(:face ,(face-at-point))))))))
         (put-text-property (1- pos) pos 'display
                            (if (member file '("." "..")) "    " (concat " " icon " ")))))

(defun all-the-icons-dired--propertize (&optional beg end &rest _)
  "Add icons using text properties from BEG to END.
They defualt to `(point-min)' and `(point-max)'."
       (let ((beg (or beg (point-min)))
             (end (or end (point-max))))
         (with-silent-modifications
           (ignore-errors (save-excursion (goto-char beg)
                                          (while (< (point) end)
                                            (when-let ((pos (dired-move-to-filename)))
                                              (all-the-icons-dired--put-icon pos))
                                            (forward-line 1)))))))

(defun all-the-icons-dired--setup ()
  "Setup `all-the-icons-dired'."
  (add-hook 'dired-after-readin-hook #'all-the-icons-dired--propertize)
  (advice-add 'dired-insert-set-properties :before #'all-the-icons-dired--propertize))

(defun all-the-icons-dired--teardown ()
  "Functions used as advice when redisplaying buffer."
  (remove-hook 'dired-after-readin-hook #'all-the-icons-dired--propertize)
  (advice-remove 'dired-insert-set-properties #'all-the-icons-dired--propertize))

all-the-icons-dired--put-icon and all-the-icons-dired--propertize are adaptation of the basic functions in this repository to use text-properties instead of overlays. Most of the advices and all the overlay management code just gets removed and I changed it the all-the-icons-dired-mode to a global minor mode because that makes more sense to me.

wyuenho commented 2 years ago

The reason for using overlay is documented here.

Opening up a directory with 15k files is not a common use case, in fact, putting that many files into a single directory is frowned upon as one might run into performance issues in some file systems as it bumps up against a max file limit in a directory. That said, I like the idea of using a display text property, using JIT font-lock is also worth experimenting. Feel free to send over a PR with some benchmarks showing perf improvements.

aikrahguzar commented 2 years ago

The reason for using overlay is documented here.

Text properties avoid changing the buffer text so that reason doesn't apply.

Opening up a directory with 15k files is not a common use case, in fact, putting that many files into a single directory is frowned upon as one might run into performance issues in some file systems as it bumps up against a max file limit in a directory.

It was a systems directory and I didn't know how many files it has when I opened it :) but I would rather use dired for all my filesystem exploration and I think while most of them the time I am looking at most a dozen files, I would rather have dired handle much more than that.

That said, I like the idea of using a display text property, using JIT font-lock is also worth experimenting. Feel free to send over a PR with some benchmarks showing perf improvements.

I think JIT solution would be best but I don't know nearly enough about font-locking to implement it. This is very unscientific but I did some incantations of (benchmark-run-compiled n (let ((dired-mode-hook '(all-the-icons-dired-mode))) (kill-buffer (find-file-noselect dir)))) for some directories and values of n. The results were indistinguishable for a directory with about 50 files. For a directory with about 2500 files and n=10, it was 2.7 secs with text-properties and 3.5 seconds with overlays so there is a definite performance improvement but not something that a user would notice. For the directory with 15k files and 5, it was 13 seconds vs 50 secs so it is definitely noticeable now.

If you have idea of more systematic benchmark let me know. Overall I think the reason to go with the change would be more that it simplifies the code than the performance improvement brings. I would open a pr shortly but it is pretty invasive change and I don't know if other package or people's configs rely on the current behavior so that would be a worry.

alexluigit commented 2 years ago

Hi, @aikrahguzar, @wyuenho

I noticed this issue when I was editing the FAQ page of Dirvish, and I hope my post here can be helpful and not too abrupt.

I would rather use dired for all my filesystem exploration and I think while most of them the time I am looking at most a dozen files, I would rather have dired handle much more than that.

I use Dired (Dirvish actually) as my only file manager, so I have similar use cases as you described here. I like your text property implementation, it's simpler and faster, although it might need more testing before merging. But this degree of improvement is still far from usable, wait 2 whole seconds for opening a directory containing 10,000 entries is just unacceptable in 21st century.

using JIT font-lock is also worth experimenting

I don't know if font-lock is a suitable tool for this, but my gut feeling is that using regex to capture the icon string might not be a good idea. After all, match filenames in Dired is already complicated enough, just check out the source code of diredfl.

IMO the core issue here is not overlay vs. text property, but the usage of dired-after-readin-hook, or more specifically: it takes too much time looping through all the files in a directory (especially for those huge ones) and rendering icon for them all at once, for it's O(n) time complexity. Don't get me wrong, I don't think this is a mistake made by all-the-icons-dired and other similar packages like dired-git-info that are using this hook, but more of a design "flaw" of Dired itself, because we don't have any lazy load mechanism inside of Dired, which is the key to solve the performance issue.

This is actually an important concern while I developing the Dirvish package. We need to render icons and other attributes lazily so that we can have useful, possibly different kinds of, extra information displayed in the Dired buffer without sacrificing the speed.

This is how Dirvish opens up a directory with 50,000 entries, with not only icons but also file size, line highlighting and unique file name collapsing.

https://user-images.githubusercontent.com/16313743/173240506-8e03324f-fbaa-446c-baeb-c42caec6eef7.mp4

(my machine is pretty old with just a normal SSD, nothing fancy)

If you are using Dired as your directory navigator like I do, or you are using other Dired packages that loops through the directory all the time upon dired-after-readin-hook, I would strongly suggest you trying out Dirvish. On the other hand, the lazy-loading tricks applied in Dirvish might be an overkill for packages like all-the-icons-dired, if all you need is lightweight usage of Dired + icons, all-the-icons-dired is a better choice.

And since you are here too, hello Prot, thanks a lot for your modus themes! @protesilaos

protesilaos commented 2 years ago

And since you are here too, hello Prot, thanks a lot for your modus themes! @protesilaos

You are welcome! And thanks for sharing your insights here. Now I must check out Dirvish :)

aikrahguzar commented 2 years ago

IMO the core issue here is not overlay vs. text property, but the usage of dired-after-readin-hook, or more specifically: it takes too much time looping through all the files in a directory (especially for those huge ones) and rendering icon for them all at once, for it's O(n) time complexity. Don't get me wrong, I don't think this is a mistake made by all-the-icons-dired and other similar packages like dired-git-info that are using this hook, but more of a design "flaw" of Dired itself, because we don't have any lazy load mechanism inside of Dired, which is the key to solve the performance issue.

I agree with this, needing a few O(n) passes to get the buffer to look like what I might want to is not the best thing. A lazy readin would be better though I think extra passes is the bigger problem. I probably need the listing to complete before I can search with it so I don't mind an eager listing but I would have preferred it I could have found a way to add the text-property for icon at the same time, the built in text properties are added without the extra pass for it.

If you are using Dired as your directory navigator like I do, or you are using other Dired packages that loops through the directory all the time upon dired-after-readin-hook, I would strongly suggest you trying out Dirvish. On the other hand, the lazy-loading tricks applied in Dirvish might be an overkill for packages like all-the-icons-dired, if all you need is lightweight usage of Dired + icons, all-the-icons-dired is a better choice.

I actually do have Dirvish installed but I the only I am using from it at the moment is dirvish-quicksort which is really great. I tried to use it as a dired replacement and I like its mark/copy/move model. But somehow I couldn't get dired-jump to not create a new buffer every time, instead of using the pre-existing dirvish buffer, which was the main reason I didn't end up using it. And I think dirvish buffers also didn't restore when I restarted emacs and restored workspaces. I am Doom Emacs and both of these might be related to some configuration there. I didn't make much of an effort to figure those issues out, mostly because to really take advantage of lazy loading, I also need to disable filter-groups from dired hacks. I really like filter-groups and use them to keep the files I care most about near the top.

alexluigit commented 2 years ago

A lazy readin would be better though I think extra passes is the bigger problem. ... I would have preferred it I could have found a way to add the text-property for icon at the same time.

O(n) and 3 x O(n) are the same thing.

This is the exact same looping structure that dired-insert-set-properties uses

(benchmark-run-compiled 10
  (save-excursion
    (goto-char (point-min))
    (while (< (point) (point-max))
      ;; use (point) to simulate the payload
      (point)
      (forward-line 1))))

For the 50,000 example directory above, it only took 0.08s. In other words, you'd get a < 0.1s improvement for a directory with over 500,000 entries, in theory (the numbers can be different in your hardware, but doesn't change the conclusion).

But somehow I couldn't get dired-jump to not create a new buffer every time, instead of using the pre-existing dirvish buffer

Pushed a patch for this.

After cc0c04f, users should be able to reuse a hidden session (if there is any) by dired-jump or dirvish-dwim when dirvish-keep-alive-on-quit is set to t. It allows us to keep only one dirvish buffer in the buffer list when there are no visible dirvish sessions ... ...

Copied from: https://github.com/alexluigit/dirvish/discussions/57#discussion-4139640 Please post your thoughts or feedback on this topic in this link too. Thanks.

aikrahguzar commented 2 years ago

O(n) and 3 x O(n) are the same thing. They have the same algorithmic complexity but 1/3 of the time is a big performance win.

But you are completely right that this doesn't happen in this case, I had expected the common parts of different passes i.e finding where the file name starts etc to take a significant time but profiling suggests that most of the time is actually spent on finding and inserting the icon so a single pass will help very marginally.

Copied from: alexluigit/dirvish#57 (comment) Please post your thoughts or feedback on this topic in this link too. Thanks.

Thanks a lot, the new behavior helps and I added a comment to the discussion too.

Related to the performance of dired, do you have any ideas that can make dired-filter-groups faster? I like them a lot and would like to have them enabled if I accidentally open a directory with large number of files but the performance is slow. My knowledge of dired internals is non-existent so I don't know if that aspect can be made faster, but if you have any ideas I will like to try in (not so) near future.

alexluigit commented 2 years ago

Related to the performance of dired, do you have any ideas that can make dired-filter-groups faster?

Not that I know of. The reason for that is dired-filter.el heavily relies on dired-revert, which, unfortunately, is very slow especially on large/remote directories. This also puts the filter stack mechanism in a very awkward situation: it's not useful for small directories, yet it's slow on big directories. Besides, this package provides a lot of features that overlaps with dired's inbuilt marking system, of which I'm not a big fan. For the reasons above I've stopped using this package. I'm planning to implement a stripped-down version of this package in the form of a dirvish extension (like dirvish-subtree as to dired-subtree), maybe in the near future.