Closed railwaycat closed 2 years ago
Your description is a little bit confusing. You are using xref, not the citre-jump
command, right?
It looks like you are using a big tags file. Could you try this with a smaller tags file?
It might good to prompt user input when
citre-jump
be called on whitespace character
This is what Citre does. I assume you are using a completion enhance interface like ivy
or helm
or something. Since:
ivy
or helm
want the completions to let you choose fromso Citre will get all tags in a tags file before you input anything. For a huge tags file, it looks like freezing.
If you use vanilla Emacs completion UI, it will only calculate completions when you press TAB
, so you won't have the problem.
To fix it, we need to invent some magic to not block the UI, like what we did with auto-completion.
To fix it, we need to invent some magic to not block the UI, like what we did with auto-completion.
Unfortunately I think this can't be done. Since there's the "completion style" thing in Emacs, we really can't assume the user input should appear in the results. We need to give Emacs the whole collection and let itself decide how to filter it, like what Citre does for now.
Yes, exactly as you pointed out, it's a quite big tags file, about 400MB. I tried to wait longer and this time the completion list shows up after about 1 min.
I'm using helm and after the first time long waiting, the completion list shows almost immediately. I think this is acceptable for me.
Thanks for the rich information and this explains all my question.
I'm using helm and after the first time long waiting, the completion list shows almost immediately. I think this is acceptable for me.
It's not helm. It's Citre building a cache of the result. I designed it like this so the user only need to wait once ;)
Glad to know and thank you for the feature! For such a big tags I'm working on, it really saves.
I met similar issue again recently.
I'm working on a C++ project which tags file is about 123M, when I use M-.
on a whitespace. Emacs stuck and stay at 100% CPU for over half hour and the symbol list still not show up. I tried several times and can reproduce on every try. Emacs itself was not freeze, I can C-g
to quit this operate at anytime.
I tried again on a smaller C project (curl) which tags file is around 10M and I can wait until the symbol list shows, however it still takes more than 5 mins. After that, the symbol list shows immediately, I think the cache works good.
"M-." on a no-whitespace symbol works great without any problem. This issue only occurs on the first time when Emacs started and on a whitespace.
I would do a test later. I'd like to know:
Thank you!
citre-update-this-tags-file
, it will prompt to create a tags file, then I use empty
to choose all languages.brew install emacs
, no GUI) on macOS, M1BTW another project I use ctags and citre on is Envoy (https://github.com/envoyproxy/envoy), which provides a ~100MB tags file. I generated the same step as 1.
28.1.50 (build from source code on 06/09) on Linux, a VM on Xeon with performance.
You may be able to launch and use another terminal while the emacs process works hard.
Is the process hogging your CPUs only the emacs process? How about readtags process? If readtags also works hard, we want to know the command line that emacs specified for launching the readtags process.
top
command is helpful to know which processes consume the CPUs.
You may know the pid for readtags from the output of top
.
ps -f $the-pid-of-readtags
prints the command line.
I can see emacs and readtags process hit 100% CPU when I do M-.
, but after several minutes readtags process finished, left only emacs process on a high CPU consuming.
Here is the command line of readtags from ps
:
/usr/local/bin/readtags -t /home/<my username>/.cache/tags/!home!<my username>!<my project>.tags -Q (not (and $extras ((string->regexp "(^|,) ?(anonymous)(,|$)" :case-fold false) $extras))) -S (<or> (if (and $name &name) (<> (length $name) (length &name)) 0) (if (and $name &name) (<> $name &name) 0)) -Ene -l
The version of ctags I'm using (I built from source code):
> ctags --version
Universal Ctags 5.9.0(5abc6039), Copyright (C) 2015-2022 Universal Ctags Team
Universal Ctags is derived from Exuberant Ctags.
Exuberant Ctags 5.8, Copyright (C) 1996-2009 Darren Hiebert
Compiled: Aug 16 2022, 20:31:26
URL: https://ctags.io/
Optional compiled features: +wildcards, +regex, +gnulib_regex, +iconv, +option-directory, +xpath, +json, +interactive, +packcc, +optscript
I would like to know the execution time of readtags and the size of its output.
Could you try the following command line? note: unlink the command line you kindly reported in the last comment, I added single quote characters to the new command line.
$ time /usr/local/bin/readtags -t /home/<my username>/.cache/tags/!home!<my username>!<my project>.tags -Q '(not (and $extras ((string->regexp "(^|,) ?(anonymous)(,|$)" :case-fold false) $extras)))' -S '(<or> (if (and $name &name) (<> (length $name) (length &name)) 0) (if (and $name &name) (<> $name &name) 0))' -Ene -l | tee /tmp/readtags-$$.tags | wc -l
This command line records the output of /tmp/readtags-$$.tags. The file may be helpful for debugging the emacs/citre side though it may be impossible to make the file public.
(I know well about readtags. However, my knowledge about the emacs/citre side is limited.)
Output of this command:
> time /usr/local/bin/readtags -t /home/<my username>/.cache/tags/\!home\!<my username>\!<my project>\!.tags -Q '(not (and $extras ((string->regexp "(^|,) ?(anonymous)(,|$)" :case-fold false) $extras)))' -S '(<or> (if (and $name &name) (<> (length $name) (length &name)) 0) (if (and $name &name) (<> $name &name) 0))' -Ene -l | tee /tmp/readtags-$$.tags | wc -l
1783600
/usr/local/bin/readtags -t /home/<my username>/.cache/tags/!home!<my username>!<my project>!.tags - 55.85s user 1.57s system 99% cpu 57.422 total
tee /tmp/readtags-$$.tags 0.14s user 1.09s system 2% cpu 57.422 total
wc -l 0.17s user 0.38s system 0% cpu 57.421 total
The tags file of this project is 441M and the output file /tmp/readtags-14229.tags
is 426M.
Sorry, I can't share this file as you said. But if you would like to see, I can share the output from Envoy, another open source project I currently working on and it has a 123M tags file.
426M is too large. Is "anonymous" the string you looked for?
It used be much better. At the time I created this ticket, it usually took 1-2 minutes to show the list of symbols on the first run, on same project and same linux machine.
I believe the behavior of Citre is when M-.
(shortcut for find definition in Emacs) on a whitespace character, Citre will load all symbols to a list for lookup. I know this is a heavy task for tags file > 100M.
Is "anonymous" the string you looked for?
When I pressed \M-. on a space char, Emacs asked me:
Find definitions of:
I wonder what kind of string you gave for the prompt. My guessing is "anonymous". Am I correct?
I see..I have helm installed. If without helm, it should be the same behavior as hit \<tab> after Find definitions of:
prompt.
Hi, I did some profiling on a clean emacs config with only Citre. I start profiling by (profiler-start 'cpu)
, hit M-.
and then TAB after the prompt Find definitions of:
. I wait after the readtags
process ends and wait for another short while when Emacs still 100% CPU.
And here is what I got:
13586 86% - command-execute
13586 86% - call-interactively
13580 86% - funcall-interactively
13576 86% - minibuffer-complete
13576 86% - completion-in-region
13576 86% - completion--in-region
13576 86% - #<compiled -0x1151f601966036ab>
13576 86% - apply
13576 86% - #<compiled -0x4b7deebfa9cb6ab>
13576 86% - completion--in-region-1
13576 86% - completion--do-completion
13576 86% - completion--field-metadata
13576 86% - completion-metadata
13576 86% - #<lambda 0x134667978ed2cf31>
13576 86% - let*
13576 86% - if
13576 86% - let
13576 86% - cl-remove-duplicates
9060 57% - cl--delete-duplicates
1456 9% cl--position
4499 28% - mapcar
4109 26% + citre-get-tags
360 2% + #<lambda -0x32572b19cb8b13c>
It looks most of the CPU time spent in xref-backend-identifier-completion-table
.
I start to feel this is not a Citre issue, but a behavior of minibuffer completion tool like helm and ivy. Also I can try to put ignore directories to reduce the tags file size.
I can't reproduce this.
On my machine, the curl tags file is about 10MiB, and Citre builts the cache in 20 secs. On Envoy (tags file is about 120MiB) it takes 12 min and 30 secs. And my machine is much crappier than yours.
My OS is Arch Linux. I use Emacs 28.1 and ctags 5.9.0(p5.9.20210905.0), all installed from Arch official repository.
I do suggest you try to build the cache in a clean Emacs config. You can do this by eval:
(benchmark-run
(cl-remove-duplicates
(mapcar
(lambda (tag) (citre-get-tag-field 'name tag))
(citre-get-tags
"Tags file path here" nil nil
:filter citre-xref--filter
:sorter (citre-core-sorter '(length name +) 'name)
:require '(name)))
:test #'equal))
On my machine and on the curl tags file, it takes exactly 19 secs.
@masatake suspected it's a readtags problem. On my machine readtags is actually very fast. The command line Citre uses to build the cache is
$ readtags -t .tags -Q '(not (and $extras ((string->regexp "(^|,) ?(anonymous)(,|$)" :case-fold false) $extras)))' -S '(<or> (if (and $name &name) (<> (length $name) (length &name)) 0) (if (and $name &name) (<> $name &name) 0))' -Ene -l
(.tags
is my tags file name)
On curl it takes like 2 secs. On Envoy it takes < 4 secs. The bottleneck is on the Elisp side. It's because Emacs pipe IO and Elisp itself is dog slow.
I do plan to create an interactive & asynchronous tags filtering tool for Citre, which should make this "xref on whitespace" behavior useless then (and by interactively filtering tags you reduce the lines feeded to Emacs by readtags, which saves IO and parsing time). But it won't happen soon.
Thank you for the reproduce! I re-tried on a clean Emacs config which has only Citre. I also re-created tags files from scratch.
The benchmark function from above finished with a similar results for Envoy which has a 123M tags file: 917 secs, that is a bit more than 15 mins. However for a project has a 441M tags file, the same benchmark function took 19550 secs to finish, which is nearly 5 and a half hours . As you said, the bottleneck is on the elisp side to handling this size of data.
I didn't see a big difference of benchmark from different hardware setup. I guess it because Emacs mostly relies on single core performance than use multi-cores.
Is the plan you mentioned the discussion: https://github.com/universal-ctags/citre/discussions/47? It looks nice and should be a great improve. For the current project I'm working on, I'm going to make a wrap function for M-.
to prompt and accept a string then pass it directly to citre-jump
without provide the candidates when it be called on whitespace, and keep the original behavior for no whitespace.
Thank you @masatake to bring in the details of readtags
, I learned a lot when make myself to understand the parameters. Previously I just use everything through Citre UI.
@railwaycat So, on your side, pressing M-.
on whitespaces in Envoy takes > 30 min, but Citre spends 15 min building the cache. Does this show that the problem is in helm or your config?
Is the plan you mentioned the discussion: #47?
Yes. That will be the ultimate ctags user tool ;)
Yes, it turns out there must some configuration on my side other than Citre cost extra time but not a Citre issue. I may stick with a simple wrap function for xref-find-definitions
without candidates prompt for now, because pull all symbols from a large tags file does not look efficient of any sort.
Thank you very much for the help and information! I will close the issue, sorry for the noise again.
Looking forward for the tags filtering tool:)
I have the same performance problem with way too many reference results for Emacs to sort duplicates through when using vertico. @railwaycat I wonder if and how you solved it
I use the default setup and
citre-jump
be used as the backend of xref. When I do M-. (xref-find-definitions
) on a whitespace character, Emacs will freeze and not accept any input, until a C-g.It might good to prompt user input when
citre-jump
be called on whitespace character, as I can remember this is a quite common behavior for xref backend of etags and GNU global.