rejeep / f.el

Modern API for working with files and directories in Emacs
GNU General Public License v3.0
680 stars 68 forks source link

[Help]: Faster way to search for directories recursively #117

Open jingxuanlim opened 1 year ago

jingxuanlim commented 1 year ago

Contact Details

@jingxlim

New feature

Hi, I wanted to preface this by saying that this is neither a feature request nor a bug report, and more like a cry for help.

I wanted my little script to help me find directories that contain .org files. I also wanted these directories to not be at any point, hidden. To this end, I am glad that I found f.el, which did all the heavy-lifting.

Here are the few lines of code I wrote. Bear in mind that the following are literally the first few lines of Emacs Lisp I have written, ever, which is why I'm hoping to get some feedback. I also never took a course or went through a beginners tutorial on Emacs Lisp; and instead just decided to dive right into it just by looking particular things up. It's probably not the most Lisp-y (if there's even such a thing) and very verbose, but please bear with me!

(defun is-dotstring (str)
  (integerp(string-match-p "^\\.\\w.*$" str))
  )

(defun path-has-dotstring (path)
  (consp (member t (mapcar #'is-dotstring (f-split path))))
  )

(defun dir-has-filetype (path filetype)
  (consp (f--files path (equal (f-ext it) filetype)))
  )

(f-directories root_directory (lambda (dir) (and (not (path-has-dotstring dir)) (dir-has-filetype dir "org"))) t)

So for the root_directory that I specified, this piece of code took somewhere from 1-2 minutes to run. Because I also hope to run this at the startup of Emacs, this speed won't do.

On the other hand, something like the Linux tool find achieved the same thing in less than 5 seconds.

find [root_directory] -name '*.org' -not -path '*/.*' -printf '%h\n' | sort -u

Of course, I am not claiming that I wrote them the same way, so differences in performance is expected.

However, I do want to do this in Emacs, so I'm wondering if there are any ways I could improve on it, using f.el or otherwise (e.g. piping in the results from find). In the case of f-directories, I understand that every directory is visited recursively, even if it is many layers deep into a hidden directory (e.g. .git). Perhaps one way is to write the program in a way that prevents this from happening.

Any help is greatly appreciated! Thanks!

Why this new feature

f.el is great! I just need help using it!

Implementation ideas and additional thoughts

It could be implemented doing foo, bar, and baz

Phundrak commented 1 year ago

A simple way to write what you are looking for would be the following

(seq-uniq (mapcar #'f-parent
                  (f-files "~/org"
                           (lambda (f) (f-ext-p f "org"))
                           t)))

To explain it simply, f-files searches recursively for all files in [root_dir] keeping only .org files due to the lambda. On each entry remaining, f-parent is called, returning the directory in which these org files are stored, and seq-uniq removes any duplicate entry.

On average, it runs in 40ms in my ~/org directory on my machine while your find command seems to run in around 8 to 9ms.

Note also that this code currently returns absolute paths. If you wish to have relative paths, you can wrap this code in this mapcar:

(mapcar (lambda (f) (f-relative f "[root_dir]"))
        ;; The rest of your code
        )

I wanted to try to see if faster times could be achieved with shell-command-to-string and your find command, but it becomes much slower with an average of 0.6s, i.e. around 8 times slower than the Elisp-only code.

Apparently you also ignore hidden files and directories. I will soon publish an update to f-hidden-p to make it consider any file under a hidden directory as hidden itself. That way, if you want to also ignore hidden files and files in hidden directories, you will soon be able to write:

(seq-uniq (mapcar #'f-parent
                  (f-files "~/org"
                           (lambda (f) (and (f-ext-p f "org")
                                            (not (f-hidden-p f))))
                           t)))