universal-ctags / citre

A superior code reading & auto-completion tool with pluggable backends.
GNU General Public License v3.0
336 stars 26 forks source link

freeze when call citre-jump on space character #82

Closed railwaycat closed 2 years ago

railwaycat commented 3 years ago

I use the default setup and citre-jump be used as the backend of xref. When I do M-. (xref-find-definitions) on a whitespace character, Emacs will freeze and not accept any input, until a C-g.

It might good to prompt user input when citre-jump be called on whitespace character, as I can remember this is a quite common behavior for xref backend of etags and GNU global.

AmaiKinono commented 3 years ago

Your description is a little bit confusing. You are using xref, not the citre-jump command, right?

It looks like you are using a big tags file. Could you try this with a smaller tags file?

It might good to prompt user input when citre-jump be called on whitespace character

This is what Citre does. I assume you are using a completion enhance interface like ivy or helm or something. Since:

so Citre will get all tags in a tags file before you input anything. For a huge tags file, it looks like freezing.

If you use vanilla Emacs completion UI, it will only calculate completions when you press TAB, so you won't have the problem.

To fix it, we need to invent some magic to not block the UI, like what we did with auto-completion.

AmaiKinono commented 3 years ago

To fix it, we need to invent some magic to not block the UI, like what we did with auto-completion.

Unfortunately I think this can't be done. Since there's the "completion style" thing in Emacs, we really can't assume the user input should appear in the results. We need to give Emacs the whole collection and let itself decide how to filter it, like what Citre does for now.

railwaycat commented 3 years ago

Yes, exactly as you pointed out, it's a quite big tags file, about 400MB. I tried to wait longer and this time the completion list shows up after about 1 min.

I'm using helm and after the first time long waiting, the completion list shows almost immediately. I think this is acceptable for me.

Thanks for the rich information and this explains all my question.

AmaiKinono commented 3 years ago

I'm using helm and after the first time long waiting, the completion list shows almost immediately. I think this is acceptable for me.

It's not helm. It's Citre building a cache of the result. I designed it like this so the user only need to wait once ;)

railwaycat commented 3 years ago

Glad to know and thank you for the feature! For such a big tags I'm working on, it really saves.

railwaycat commented 2 years ago

I met similar issue again recently.

I'm working on a C++ project which tags file is about 123M, when I use M-. on a whitespace. Emacs stuck and stay at 100% CPU for over half hour and the symbol list still not show up. I tried several times and can reproduce on every try. Emacs itself was not freeze, I can C-g to quit this operate at anytime.

I tried again on a smaller C project (curl) which tags file is around 10M and I can wait until the symbol list shows, however it still takes more than 5 mins. After that, the symbol list shows immediately, I think the cache works good.

"M-." on a no-whitespace symbol works great without any problem. This issue only occurs on the first time when Emacs started and on a whitespace.

AmaiKinono commented 2 years ago

I would do a test later. I'd like to know:

railwaycat commented 2 years ago

Thank you!

  1. I use citre-update-this-tags-file, it will prompt to create a tags file, then I use empty to choose all languages.
  2. I tested on:
    1. 28.1.91 (mac port version) on macOS, M1 Max
    2. 28.1 (build by brew install emacs, no GUI) on macOS, M1
    3. 28.1.50 (build from source code on 06/09) on Linux, a VM on Xeon with performance.

BTW another project I use ctags and citre on is Envoy (https://github.com/envoyproxy/envoy), which provides a ~100MB tags file. I generated the same step as 1.

masatake commented 2 years ago

28.1.50 (build from source code on 06/09) on Linux, a VM on Xeon with performance.

You may be able to launch and use another terminal while the emacs process works hard.

Is the process hogging your CPUs only the emacs process? How about readtags process? If readtags also works hard, we want to know the command line that emacs specified for launching the readtags process.

top command is helpful to know which processes consume the CPUs. You may know the pid for readtags from the output of top.

ps -f $the-pid-of-readtags prints the command line.

railwaycat commented 2 years ago

I can see emacs and readtags process hit 100% CPU when I do M-., but after several minutes readtags process finished, left only emacs process on a high CPU consuming.

Here is the command line of readtags from ps:

/usr/local/bin/readtags -t /home/<my username>/.cache/tags/!home!<my username>!<my project>.tags -Q (not (and $extras ((string->regexp "(^|,) ?(anonymous)(,|$)" :case-fold false) $extras))) -S (<or> (if (and $name &name) (<> (length $name) (length &name)) 0) (if (and $name &name) (<> $name &name) 0)) -Ene -l

The version of ctags I'm using (I built from source code):

> ctags --version
Universal Ctags 5.9.0(5abc6039), Copyright (C) 2015-2022 Universal Ctags Team
Universal Ctags is derived from Exuberant Ctags.
Exuberant Ctags 5.8, Copyright (C) 1996-2009 Darren Hiebert
  Compiled: Aug 16 2022, 20:31:26
  URL: https://ctags.io/
  Optional compiled features: +wildcards, +regex, +gnulib_regex, +iconv, +option-directory, +xpath, +json, +interactive, +packcc, +optscript
masatake commented 2 years ago

I would like to know the execution time of readtags and the size of its output.

Could you try the following command line? note: unlink the command line you kindly reported in the last comment, I added single quote characters to the new command line.

$ time /usr/local/bin/readtags -t /home/<my username>/.cache/tags/!home!<my username>!<my project>.tags -Q '(not (and $extras ((string->regexp "(^|,) ?(anonymous)(,|$)" :case-fold false) $extras)))' -S '(<or> (if (and $name &name) (<> (length $name) (length &name)) 0) (if (and $name &name) (<> $name &name) 0))' -Ene -l | tee /tmp/readtags-$$.tags | wc -l

This command line records the output of /tmp/readtags-$$.tags. The file may be helpful for debugging the emacs/citre side though it may be impossible to make the file public.

(I know well about readtags. However, my knowledge about the emacs/citre side is limited.)

railwaycat commented 2 years ago

Output of this command:

> time /usr/local/bin/readtags -t /home/<my username>/.cache/tags/\!home\!<my username>\!<my project>\!.tags -Q '(not (and $extras ((string->regexp "(^|,) ?(anonymous)(,|$)" :case-fold false) $extras)))' -S '(<or> (if (and $name &name) (<> (length $name) (length &name)) 0) (if (and $name &name) (<> $name &name) 0))' -Ene -l | tee /tmp/readtags-$$.tags | wc -l
1783600
/usr/local/bin/readtags -t /home/<my username>/.cache/tags/!home!<my username>!<my project>!.tags -  55.85s user 1.57s system 99% cpu 57.422 total
tee /tmp/readtags-$$.tags  0.14s user 1.09s system 2% cpu 57.422 total
wc -l  0.17s user 0.38s system 0% cpu 57.421 total

The tags file of this project is 441M and the output file /tmp/readtags-14229.tags is 426M.

Sorry, I can't share this file as you said. But if you would like to see, I can share the output from Envoy, another open source project I currently working on and it has a 123M tags file.

masatake commented 2 years ago

426M is too large. Is "anonymous" the string you looked for?

railwaycat commented 2 years ago

It used be much better. At the time I created this ticket, it usually took 1-2 minutes to show the list of symbols on the first run, on same project and same linux machine.

I believe the behavior of Citre is when M-. (shortcut for find definition in Emacs) on a whitespace character, Citre will load all symbols to a list for lookup. I know this is a heavy task for tags file > 100M.

masatake commented 2 years ago

Is "anonymous" the string you looked for?

masatake commented 2 years ago

When I pressed \M-. on a space char, Emacs asked me:

Find definitions of:

I wonder what kind of string you gave for the prompt. My guessing is "anonymous". Am I correct?

railwaycat commented 2 years ago

I see..I have helm installed. If without helm, it should be the same behavior as hit \<tab> after Find definitions of: prompt.

railwaycat commented 2 years ago

Hi, I did some profiling on a clean emacs config with only Citre. I start profiling by (profiler-start 'cpu), hit M-. and then TAB after the prompt Find definitions of:. I wait after the readtags process ends and wait for another short while when Emacs still 100% CPU.

And here is what I got:

13586  86% - command-execute
13586  86%  - call-interactively
13580  86%   - funcall-interactively
13576  86%    - minibuffer-complete
13576  86%     - completion-in-region
13576  86%      - completion--in-region
13576  86%       - #<compiled -0x1151f601966036ab>
13576  86%        - apply
13576  86%         - #<compiled -0x4b7deebfa9cb6ab>
13576  86%          - completion--in-region-1
13576  86%           - completion--do-completion
13576  86%            - completion--field-metadata
13576  86%             - completion-metadata
13576  86%              - #<lambda 0x134667978ed2cf31>
13576  86%               - let*
13576  86%                - if
13576  86%                 - let
13576  86%                  - cl-remove-duplicates
 9060  57%                   - cl--delete-duplicates
 1456   9%                      cl--position
 4499  28%                   - mapcar
 4109  26%                    + citre-get-tags
  360   2%                    + #<lambda -0x32572b19cb8b13c>

It looks most of the CPU time spent in xref-backend-identifier-completion-table.

I start to feel this is not a Citre issue, but a behavior of minibuffer completion tool like helm and ivy. Also I can try to put ignore directories to reduce the tags file size.

AmaiKinono commented 2 years ago

I can't reproduce this.

On my machine, the curl tags file is about 10MiB, and Citre builts the cache in 20 secs. On Envoy (tags file is about 120MiB) it takes 12 min and 30 secs. And my machine is much crappier than yours.

My OS is Arch Linux. I use Emacs 28.1 and ctags 5.9.0(p5.9.20210905.0), all installed from Arch official repository.

I do suggest you try to build the cache in a clean Emacs config. You can do this by eval:

(benchmark-run
    (cl-remove-duplicates
     (mapcar
      (lambda (tag) (citre-get-tag-field 'name tag))
      (citre-get-tags
       "Tags file path here" nil nil
       :filter citre-xref--filter
       :sorter (citre-core-sorter '(length name +) 'name)
       :require '(name)))
     :test #'equal))

On my machine and on the curl tags file, it takes exactly 19 secs.

@masatake suspected it's a readtags problem. On my machine readtags is actually very fast. The command line Citre uses to build the cache is

$ readtags -t .tags -Q '(not (and $extras ((string->regexp "(^|,) ?(anonymous)(,|$)" :case-fold false) $extras)))' -S '(<or> (if (and $name &name) (<> (length $name) (length &name)) 0) (if (and $name &name) (<> $name &name) 0))' -Ene -l

(.tags is my tags file name)

On curl it takes like 2 secs. On Envoy it takes < 4 secs. The bottleneck is on the Elisp side. It's because Emacs pipe IO and Elisp itself is dog slow.

I do plan to create an interactive & asynchronous tags filtering tool for Citre, which should make this "xref on whitespace" behavior useless then (and by interactively filtering tags you reduce the lines feeded to Emacs by readtags, which saves IO and parsing time). But it won't happen soon.

railwaycat commented 2 years ago

Thank you for the reproduce! I re-tried on a clean Emacs config which has only Citre. I also re-created tags files from scratch.

The benchmark function from above finished with a similar results for Envoy which has a 123M tags file: 917 secs, that is a bit more than 15 mins. However for a project has a 441M tags file, the same benchmark function took 19550 secs to finish, which is nearly 5 and a half hours . As you said, the bottleneck is on the elisp side to handling this size of data.

I didn't see a big difference of benchmark from different hardware setup. I guess it because Emacs mostly relies on single core performance than use multi-cores.

Is the plan you mentioned the discussion: https://github.com/universal-ctags/citre/discussions/47? It looks nice and should be a great improve. For the current project I'm working on, I'm going to make a wrap function for M-. to prompt and accept a string then pass it directly to citre-jump without provide the candidates when it be called on whitespace, and keep the original behavior for no whitespace.

Thank you @masatake to bring in the details of readtags, I learned a lot when make myself to understand the parameters. Previously I just use everything through Citre UI.

AmaiKinono commented 2 years ago

@railwaycat So, on your side, pressing M-. on whitespaces in Envoy takes > 30 min, but Citre spends 15 min building the cache. Does this show that the problem is in helm or your config?

Is the plan you mentioned the discussion: #47?

Yes. That will be the ultimate ctags user tool ;)

railwaycat commented 2 years ago

Yes, it turns out there must some configuration on my side other than Citre cost extra time but not a Citre issue. I may stick with a simple wrap function for xref-find-definitions without candidates prompt for now, because pull all symbols from a large tags file does not look efficient of any sort. Thank you very much for the help and information! I will close the issue, sorry for the noise again.

Looking forward for the tags filtering tool:)

shaohme commented 1 year ago

I have the same performance problem with way too many reference results for Emacs to sort duplicates through when using vertico. @railwaycat I wonder if and how you solved it