joaotavora / eglot

A client for Language Server Protocol servers
GNU General Public License v3.0
2.27k stars 200 forks source link

Emacs freezes (~10s) with big node_modules directory when starting eglot #697

Closed melias122 closed 3 years ago

melias122 commented 3 years ago

Hey I am trying to move from lsp-mode to eglot but I have some issues one of the projects I currently work on. As title says when starting eglot in project containing node_modules (~50k files), emacs freezes for some time. When I remove node_modules emacs does not freeze. Just to note, lsp-mode does not have this issue.

LSP transcript - M-x eglot-events-buffer (mandatory unless Emacs inoperable)

[server-request] (id:2) Wed May 26 14:36:25 2021:
(:jsonrpc "2.0" :method "client/registerCapability" :params
      (:registrations
       [(:id "workspace/didChangeWatchedFiles-0" :method "workspace/didChangeWatchedFiles" :registerOptions
         (:watchers
          [(:globPattern "**/*.{go,mod,sum}" :kind 7)
           (:globPattern "{/home/melias122/code/axonpro/mydoctor/cmd,/home/melias122/code/axonpro/mydoctor/cmd/import,/home/melias122/code/axonpro/mydoctor/cmd/jwtkeygen,/home/melias122/code/axonpro/mydoctor/cmd/mydoctor-frontend,/home/melias122/code/axonpro/mydoctor/cmd/mydoctord,/home/melias122/code/axonpro/mydoctor/config,/home/melias122/code/axonpro/mydoctor/db,/home/melias122/code/axonpro/mydoctor/db/minio,/home/melias122/code/axonpro/mydoctor/db/mysql,/home/melias122/code/axonpro/mydoctor/demo,/home/melias122/code/axonpro/mydoctor/hooks,/home/melias122/code/axonpro/mydoctor/http,/home/melias122/code/axonpro/mydoctor/imageutil,/home/melias122/code/axonpro/mydoctor/import,/home/melias122/code/axonpro/mydoctor/import/codebook,/home/melias122/code/axonpro/mydoctor/import/imports,/home/melias122/code/axonpro/mydoctor/import/xmlmodel,/home/melias122/code/axonpro/mydoctor/import/xmlmodel/dpxml,/home/melias122/code/axonpro/mydoctor/import/xmlmodel/liekxml,/home/melias122/code/axonpro/mydoctor/import/xmlmodel/ockovacikalendarxml,/home/melias122/code/axonpro/mydoctor/import/xmlmodel/zpxml,/home/melias122/code/axonpro/mydoctor/internal,/home/melias122/code/axonpro/mydoctor/internal/ambulanceserver,/home/melias122/code/axonpro/mydoctor/internal/authserver,/home/melias122/code/axonpro/mydoctor/internal/clinicserver,/home/melias122/code/axonpro/mydoctor/internal/codebookserver,/home/melias122/code/axonpro/mydoctor/internal/infoserver,/home/melias122/code/axonpro/mydoctor/internal/registrationserver,/home/melias122/code/axonpro/mydoctor/internal/userserver,/home/melias122/code/axonpro/mydoctor/jwt,/home/melias122/code/axonpro/mydoctor/log,/home/melias122/code/axonpro/mydoctor/mock,/home/melias122/code/axonpro/mydoctor/mocks,/home/melias122/code/axonpro/mydoctor/mydoctor,/home/melias122/code/axonpro/mydoctor/proto,/home/melias122/code/axonpro/mydoctor/pwhash,/home/melias122/code/axonpro/mydoctor/sms,/home/melias122/code/axonpro/mydoctor/third_party,/home/melias122/code/axonpro/mydoctor/valid,/home/melias122/code/axonpro/mydoctor/version,/home/melias122/code/axonpro/mydoctor/web}" :kind 7)]))])
      :id 2)
[client-reply] (id:2) Wed May 26 14:36:33 2021:
(:jsonrpc "2.0" :id 2 :result nil)
joaotavora commented 3 years ago

[ You skip some parts of the template that the template asks you not to skip. You are missing a minimal reproduction recipe. I need to be able to reproduce the error. Please, next time, don't skip the parts that say DO NOT SKIP. Alternatively, open a discussion]

Notheless I think this might be related to https://github.com/joaotavora/eglot/issues/645 or https://github.com/joaotavora/eglot/issues/633. Is this project publicly available somewhere?

danielpza commented 3 years ago

@melias122 do you have a tsconfig.json/jsconfig.json file? can you post it?

melias122 commented 3 years ago

@joaotavora, sorry for that. Repository is private, but I will try to make reproducible repo. I will also try v1.7 if it happens there. @danielpza

jsonconfig.json
{
  "include": [
    "./src/**/*"
  ]
}
joaotavora commented 3 years ago

@melias122 I wish you could also try the scratch/fix-697-attempt branch. Maybe try that before working on the reproducible repo.

I'd be very thankful if you can test this, since the issuers of the other issues didn't follow through.

melias122 commented 3 years ago

Just tried scratch/fix-697-attempt and it is working fine with no freezes. I would be nice if there is a way to specify directories to ignore. Or maybe just follow .gitignore.

joaotavora commented 3 years ago

Or maybe just follow .gitignore.

That's an idea...

jarreds commented 3 years ago

Or maybe just follow .gitignore.

lsp-mode has a customizable lsp-file-watch-ignored list variable for this. That works well, imo.

joaotavora commented 3 years ago

Yes, lsp has knobs for a lot of things, but asking the user to customize yet another knob in what should be an out-of-box experience is sub-optimal. So if following .gitignore works for the majority of cases, I welcome that simplicity.

jarreds commented 3 years ago

Simplicity is why I love eglot. As far as logic for specific version control systems, wonder if there is some inspiration (or code) to be found in Projectile VCS support.

joaotavora commented 3 years ago

Yes, Dmitry Gutov has suggested using project.el for this, so this is probably what I'll try soon.

On Wed, May 26, 2021, 16:12 Jarred Ward @.***> wrote:

Simplicity is why I love eglot. As far as logic for specific version control systems, wonder if there is some inspiration (or code) to be found in Projectile VCS support https://docs.projectile.mx/projectile/2.3/projects.html#version-control-systems .

— You are receiving this because you modified the open/close state. Reply to this email directly, view it on GitHub https://github.com/joaotavora/eglot/issues/697#issuecomment-848853708, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAC6PQYB75YYKOQRFUO5Q6LTPUFVRANCNFSM45SBZGLA .

joaotavora commented 3 years ago

Pushed another commit making use of project.el for this. It's already a dependency of eglot.el anyway and is the rightful module to ask about ignored directories.

melias122 commented 3 years ago

Tried, now emacs pauses for ~2 seconds (which is not bad, but still there is pause), but on 835aa0e, there was no pause. My root .gitignore has:

**/node_modules

With empty .gitignore it is also ~2 seconds. Any clue why?

dgutov commented 3 years ago

Try evaluating these:

(benchmark 1 '(project-files (project-current)))

and

(benchmark 1 '(delete-dups (mapcar #'file-name-directory (project-files (project-current)))))

...and tell us the numbers.

joaotavora commented 3 years ago

Hmm, I used @dgutov 's suggestion which is faster in some cases but probably slower in others. For example in a full repo of the snowpack repository, it's faster by about 3x, but I guess in your example it's not worth it.

As I wrote:

The directory-finding logic is probably a bit slower than using
eglot--directories-recursively, but since it honours `.gitignores` and
ignores more directories it's much faster overall.  And guaranteed to
create less watchers.

I guess, I'll have to some a bit more work to try and get the best of both worlds.

joaotavora commented 3 years ago

@dgutov my theory if you have a directory with many many files, calling file-name-directory for each one is going to allocate a lot of strings. If project-files were to also return directories in the first place, both that allocation and the NlogN (at best) delete-dups would be avoided.

dgutov commented 3 years ago

It is slower than find, but I never compared against a Lisp-based directory enumerator. It's possible it is slower, of course, but even in gecko-dev (a huge project) the latter form prints "1.936423s" on my machine.

OTOH, it's possible that the list of directories returned by the new approach is longer, and you end up spending more time setting up watches.

joaotavora commented 3 years ago

OTOH, it's possible that the list of directories returned by the new approach is longer, and you end up spending more time setting up watches.

It's likely shorter, since it ignores more stuff. In the snowpack (https://www.snowpack.dev/) repo your approach was faster by about 3x than eglot--directories-recursively (which I have now deleted, but will bring back soon). That's because eglot--directories-recursively returns twice as many directories (and the excess directories were quite useless).

dgutov commented 3 years ago

calling file-name-directory for each one is going to allocate a lot of strings

It's definitely not the most optimal approach, but it shouldn't be too terrible for a one-time action on most small-to-moderate sized projects.

If project-files were to also return directories in the first place

We can add a new defgeneric, sure. Anyone is welcome to suggest an optimal implementation of it for a Git backend.

joaotavora commented 3 years ago

Following up to that, if @melias122 's .gitignore is only **/node_modules and nothing else, both approaches will return the same dirs, but the "make a big string with full dir name for each file then remove duplicates" approach is likely slower.

We can add a new defgeneric, sure. Anyone is welcome to suggest an optimal implementation of it for a Git backend.

I'm not sure about the need for defgeneric or what that means.

I just think that if eglot--directories-recursively could consult project.el's knowledge of what constitute "ignored" directories we would be golden. Maybe that is what project-ignores does?

If I manage that, I guess, eglot--directories-recursively could become project-directories.

joaotavora commented 3 years ago

@dgutov

project-ignores is a compiled Lisp function in `project.el'.

(project-ignores PROJECT DIR)

Return the list of glob patterns to ignore inside DIR.
Patterns can match both regular files and directories.

But how to match them? What datatype exactly is "glob" pattern, is it a regexp?

Oh I see they are actual glob strings, presumably of the find format. And I see project-files uses the find util and passes the globs down... Hmmm I wonder if this isn't going to be slow on windows/mingw...

My current thinking is to use eglot--directories-recursively, and compile the globs using Eglot's glob matcher (which is supposed to be fast) to do the filtering.

In the meantime, I'm reverting the last commit.

dgutov commented 3 years ago

but the "make a big string with full dir name for each file then remove duplicates" approach is likely slower

Don't forget about the extra overhead of several Lisp calls for every directory you are traversing. So I wouldn't be sure.

I'm not sure about the need for defgeneric or what that means.

A new generic function acting on project instances.

Oh I see they are actual glob strings, presumably of the find format.

Or of .gitignore format (with slight adjustment, as documented).

joaotavora commented 3 years ago

Don't forget about the extra overhead of several Lisp calls for every directory you are traversing. So I wouldn't be sure

I'm reasonably sure that's the case. The number of directories is dramatically smaller than the number of file. So if all other things equal, traversing directories is much faster.

The snowpack repo example was especially benevolent to your approach, likely because of the large .gitignores file.

However my ~/tmp directory more closely matches the case being described by @melias122

(defun eglot--directories-recursively (&optional dir)
  "Because `directory-files-recursively' isn't complete in 26.3."
  (cons (setq dir (expand-file-name (or dir default-directory)))
        (cl-loop with default-directory = dir
                 with completion-regexp-list = '("^[^.]")
                 for f in (file-name-all-completions "" dir)
                 if (file-directory-p f)
                 append (eglot--directories-recursively f))))

(defun eglot--directories-recursively-project (&optional dir)
  (delete-dups (mapcar #'file-name-directory
                       (project-files
                        (or (project-current nil dir)
                            (cons 'transient dir))))))

(cl-set-difference
  (eglot--directories-recursively-project "~/tmp/")
  (eglot--directories-recursively "~/tmp/")
  :test 'string=) ; nil

(length (eglot--directories-recursively "~/tmp/")); 678
(length (eglot--directories-recursively-project "~/tmp/")) ; 624 not sure why less...

(benchmark 10 '(eglot--directories-recursively-project "~/tmp/")) ; "Elapsed time: 3.594734s (0.419492s in 10 GCs)"
(benchmark 10 '(eglot--directories-recursively "~/tmp/")); "Elapsed time: 1.836761s (0.382498s in 9 GCs)"

Another way to go about this might be to simply go into project.el and make a version of project-files (called maybe project-directories) that puts type d instead of type f in the find call. If I hack that in, project-files starts returning the directories I want:

(length (project-files (cons 'transient "~/tmp"))); 678
(benchmark 10 '(project-files (cons 'transient "~/tmp"))); "Elapsed time: 2.580373s"

So even find -type d seems to be slower, though not as bad as current project-files. For my heterogenous ~/tmp dir, of course:

❯ find . -type f | wc -l
9932
❯ find . -type d | wc -l
678
joaotavora commented 3 years ago

Final thought on this:

  1. Tried the Elisp glob approach, gave correct correct results but sucked in performance when comparing to find f. Guess find's C globs are much faster.
  2. Many projects, not just the JS snowpack, seem to favour the find/project-files approach slightly.
  3. However, since this particular performance problem happens only in a subset of LSP situations, I think I'm going to revert the last commit. No point in having this subpar version for the only person who still is bothered by the performance problem.
  4. A project-directories addition to project.el is desired. Should be faster than the current project-files approach. Should actually be easy to implement.
  5. @melias122 if you could post some statistics about the distribution of files names in your project (maybe even mangle the file names and keep the directory structure), it'd be nice.
dgutov commented 3 years ago

The more ignore entries there are, the slower the find implementation becomes (Git does some smart file tracking, or pattern precompilation, IDK, that makes this way faster). Likewise, a hand-written tree traversal written in Lisp will become slower with more ignore rules added, because it's fairly difficult to write something other than a loop with comparisons for each directory name against each glob to ignore.

However, since this particular performance problem happens only in a subset of LSP situations, I think I'm going to revert the last commit. No point in having this subpar version for the only person who still is bothered by the performance problem.

No objections from me, but that's why I asked @melias122 to run a couple of basic benchmarks. There might be something else going on.

A project-directories addition to project.el is desired. Should be faster than the current project-files approach. Should actually be easy to implement.

A find-based one? Like we have just demonstrated, I think, it can be faster in some projects, while slower in others.

dgutov commented 3 years ago

Also, when you were testing in ~/tmp, it wasn't a Git-backed directory, so you weren't taking advantage of git ls-files's performance boost.

joaotavora commented 3 years ago

because it's fairly difficult to write something other than a loop with comparisons for each directory name against each glob to ignore.

Not only difficult, but it would seem downright impossible. More ignore rules means less things to traverse. Anyway, find -based on Elisp-based should suffer proportionally here.

The point I hope to have demonstrated with the ~/tmp benchmark is that allocating a bunch of useless strings for a 600 dir, 10000 file case can add a lot of time and that it's fairly easy to get that time back.

Also, when you were testing in ~/tmp, it wasn't a Git-backed directory, so you weren't taking advantage of git ls-files's performance boost.

Ah that, may be it indeed.

A find-based one? Like we have just demonstrated, I think, it can be faster in some projects, while slower in others.

Yes, but if you combine it with the git ls-files thing it should be the fastest. That's why I showed the "hacked project-files with type d" case. Let's wait for @melias122 numbers of .

melias122 commented 3 years ago

@dgutov "Elapsed time: 1.147619s" and "Elapsed time: 1.225180s".

EDIT: When I run them both multiple times they are ~1.2s.

melias122 commented 3 years ago

@joaotavora attaching project file in project just find .. Hope it helps. p.zip

joaotavora commented 3 years ago

Thanks, you have 70k files, which is large but not unheard of.

"Elapsed time: 1.147619s" and "Elapsed time: 1.225180s".

These don't suggest any significant impact of the string allocation (though it seems to be there, and systematically so).

Can you rewind to https://github.com/joaotavora/eglot/commit/835aa0e9b351e371be82f0661e438b13e641a5eb, where you say there was "no pause", and evaluate (benchmark-run '(eglot--directories-recursively "path-to-your-project")) and tell us the times?

melias122 commented 3 years ago

@joaotavora on https://github.com/joaotavora/eglot/commit/835aa0e9b351e371be82f0661e438b13e641a5eb it is 1.8849999999999997e-06.

joaotavora commented 3 years ago

Hmm, that seems just a tad too suspiciously fast, can you paste the results of evaluating each of

(length (eglot--directories-recursively "path-to-your-project"))
(length (delete-dups (mapcar #'file-name-directory (project-files (project-current))))
melias122 commented 3 years ago

Both returned: 2 (#o2, #x2, ?\C-b)

joaotavora commented 3 years ago

Hmmm, it seems you're not running this in the correct directory, maybe it's the instructions I gave you, which are confusing. Go to your scratch buffer and do this first:

M-x cd RET /path/to/the/root/directory/of/the/project/thats/giving/you/trouble

Now run again, but without the "path-to-your-project" as earlier:

M-: (length (eglot--directories-recursively)) RET
M-: (length (delete-dups (mapcar #'file-name-directory (project-files (project-current)))) RET

It should give you some number, like 100-300 (around the number of dirs in your project I guess).

When you confirm that it's returning plausible results, please run the two benchmarks again, also with M-: and in the *scratch* buffer. Just replace the length by benchmark 1 and add a ' before the form.

dgutov commented 3 years ago

@joaotavora

Yes, but if you combine it with the git ls-files thing it should be the fastest.

Not sure I understand. How to combine them? Either we use git ls-files (and then postprocess the strings), or find -type d.

melias122 commented 3 years ago

@joaotavora

218 (#o332, #xda)
7663 (#o16757, #x1def)

git ls-files | xargs -n 1 dirname | uniq | wc -l also gives 218

joaotavora commented 3 years ago

Not sure I understand. How to combine them? Either we use git ls-files (and then postprocess the strings), or find -type d.

Right, they're not combinable. I was confused. The git ls-files output seems pre-sorted though, so if the string overhead is the is what I saw earlier, it may make sense to use this to our advantage. But clearly, something else is going on here.

joaotavora commented 3 years ago

@melias122

Ah, that makes much more sense. And it also means that project-files is somehow failing at its mission.

dgutov commented 3 years ago

The git ls-files output seems pre-sorted though, so if the string overhead is the is what I saw earlier, it may make sense to use this to our advantage.

Yeah, it might be amenable to some clever parsing, I suppose.

And it also means that project-files is somehow failing at its mission.

I suppose the next step is to compare the actual outputs and see what's included that shouldn't be. And why.

Compared to "plain" ls-files, project--vc-list-files also includes untracked files, for obvious reasons (ones not mentioned in .gitignore). And thus, if @melias122 has a lot of such files and directories, that would explain the difference.

But I don't see why eglot--directories-recursively would ignore them, though.

joaotavora commented 3 years ago

I suppose the next step is to compare the actual outputs and see what's included that shouldn't be. And why.

@melias122 can you try

(cl-subseq
 (cl-set-difference (delete-dups (mapcar #'file-name-directory (project-files (project-current))))
                    (eglot--directories-recursively)
                    :test #'string=)
                    0 30)

And post the output here?

But I don't see why eglot--directories-recursively would ignore them, though.

It ignores directories that start with ., an heuristic (which appears to work here).

joaotavora commented 3 years ago

It ignores directories that start with ., an heuristic (which appears to work here).

Inspecting p.zip, there's a .cache directory but inside it there aren't 7000+ directories.

joaotavora commented 3 years ago

Inspecting p.zip, there's a .cache directory but inside it there aren't 7000+ directories.

And it's inside node_modules anyway, so I'm at a loss here.

melias122 commented 3 years ago

@joaotavora That gave me:

("/home/melias122/code/axonpro/mydoctor/web/adminapp/node_modules/.cache/babel-loader/" "/home/melias122/code/axonpro/mydoctor/web/adminapp/node_modules/.cache/eslint-loader/" "/home/melias122/code/axonpro/mydoctor/web/adminapp/node_modules/.cache/terser-webpack-plugin/content-v2/sha512/14/fb/" "/home/melias122/code/axonpro/mydoctor/web/adminapp/node_modules/.cache/terser-webpack-plugin/content-v2/sha512/5b/58/" "/home/melias122/code/axonpro/mydoctor/web/adminapp/node_modules/.cache/terser-webpack-plugin/index-v5/09/fb/" "/home/melias122/code/axonpro/mydoctor/web/adminapp/node_modules/.cache/terser-webpack-plugin/index-v5/f0/c6/" "/home/melias122/code/axonpro/mydoctor/web/adminapp/node_modules/.cache/vue-loader/" "/home/melias122/code/axonpro/mydoctor/web/adminapp/node_modules/@babel/code-frame/" "/home/melias122/code/axonpro/mydoctor/web/adminapp/node_modules/@babel/code-frame/lib/" "/home/melias122/code/axonpro/mydoctor/web/adminapp/node_modules/@babel/core/" "/home/melias122/code/axonpro/mydoctor/web/adminapp/node_modules/@babel/core/lib/config/" "/home/melias122/code/axonpro/mydoctor/web/adminapp/node_modules/@babel/core/lib/config/files/" ...)
dgutov commented 3 years ago

@melias122 Any chance you don't have the corresponding entries in .gitignore?

melias122 commented 3 years ago

@dgutov I have **/node_modules in root .gitignore and also in adminapp/.gitignore there is node_modules.

joaotavora commented 3 years ago

I think the double star notation is not in all versions of Git. Maybe your shell is picking up one version and project.el is using another. Try just "node_modules".

João

On Thu, May 27, 2021, 13:28 Martin Eliáš @.***> wrote:

@dgutov https://github.com/dgutov I have **/node_modules in root .gitignore and also in adminapp/.gitignore there is node_modules.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/joaotavora/eglot/issues/697#issuecomment-849590265, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAC6PQ3N3HMFVDKWWYR5URTTPY3FFANCNFSM45SBZGLA .

dgutov commented 3 years ago

I don't know, either seems sufficient to have the directory ignored in my local testing.

Even if just adminapp/.gitignore contains node_modules, (project-files (project-current)) does not return any files from within its contents. The only way this wouldn't work is if node_modules is checked into the repository.

melias122 commented 3 years ago

@joaotavora I tried both **/node_modules in root and node_modules in dirs where they are generated. None of theese works for me on master.

@dgutov why would we want to have node_modules in the repository? :-)

Looking in project.el it looks like only .gitignore at the root is supported

(cl-defmethod project-ignores ((project (head vc)) dir)
...
             ;; FIXME: This seems to be Git-specific.
             ;; And / in the entry (start or even the middle) means
             ;; the pattern is "rooted".  Or actually it is then
             ;; relative to its respective .gitignore (of which there
             ;; could be several), but we only support .gitignore at
             ;; the root.
dgutov commented 3 years ago

why would we want to have node_modules in the repository? :-)

That would be a mistake, yes, but it's the only case I know of where Git ignores its own .gitignore entries.

OTOH, you said that git ls-files behaves as expected, so this is probably not it.

Looking in project.el it looks like only .gitignore at the root is supported

That only affects what project-ignores returns, but the internal implementation of project-files for project-vc doesn't use it.

Could you maybe M-x edebug-defun on project--vc-list-files? And see whether it receives some unexpected arguments.

You can also try calling git ls-files -c -o --exclude-standard in the terminal.

melias122 commented 3 years ago

Hmm, I think I found whats wrong. I noticed different results for (length (delete-dups (mapcar #'file-name-directory (project-files (project-current))))) with and without projectile loaded 192 and 7658.

I have this in my config (in use-pacage projectile). I am not sure why this is causing troubles:

(defun m/projectile-project-find-function (dir)
  (let ((root (projectile-project-root dir)))
    (and root (cons 'transient root))))

(with-eval-after-load 'project
  (add-to-list 'project-find-functions 'm/projectile-project-find-function))
joaotavora commented 3 years ago

ah! nice. Kill that and we're good :-). Or maybe put it later in the list (what good is it anyway?).