ycm-core / YouCompleteMe

A code-completion engine for Vim
http://ycm-core.github.io/YouCompleteMe/
GNU General Public License v3.0
25.44k stars 2.81k forks source link

Large RAM usage when loading tags files #595

Closed zhaocai closed 8 years ago

zhaocai commented 10 years ago

I have 16GB memory installed in the Mac. So it is about 2.5GB memory usage. Just to check if this is a normal behavior. ( No error output in the log. )

USER      PID  %CPU %MEM      VSZ    RSS   TT  STAT STARTED      TIME COMMAND
zhaocai 94041   0.0 16.5  5298224 2772100   ??  S     1:11PM   0:05.39 /usr/bin/python /Users/zhaocai/.vim/bundle/YouCompleteMe/python/ycm/server/ycmd.py --port=64315 --options_file=/var/folders/w8/w9m8r3091wj_kk13dhdvj_nc0000gn/T/tmpo0zt31 --log=error
Valloric commented 10 years ago

Are you editing C-family code like C, C++, ObjC?

zhaocai commented 10 years ago

No, just some json, vim, ruby, markdwon, python, ruby, etc. The only thing I can think of related is that I like to open many files( tabs) and 3~6 MacVim separate windows.

I just got my system hang for a few minutes and have to kill the ycmd server. Then I notice there are multiple ycmd servers activated. Each of them is taken a huge chunk of memory.

SER      PID  %CPU %MEM      VSZ    RSS   TT  STAT STARTED      TIME COMMAND
zhaocai 83002  99.3 11.5  4860652 1933664   ??  R     4:57PM   1:30.17 /usr/bin/python /Users/zhaocai/.vim/bundle/YouCompleteMe/python/ycm/server/ycmd.py --port=57761 --options_file=/var/folders/w8/w9m8r3091wj_kk13dhdvj_nc0000gn/T/tmpuiluw3 --log=error
zhaocai 76242   0.1 11.8  6558156 1977012   ??  S     4:54PM   0:07.34 /usr/bin/python /Users/zhaocai/.vim/bundle/YouCompleteMe/python/ycm/server/ycmd.py --port=56777 --options_file=/var/folders/w8/w9m8r3091wj_kk13dhdvj_nc0000gn/T/tmpy5dotK --log=error
zhaocai 35491   0.0 18.6  6532932 3128512   ??  S     4:34PM   0:08.39 /usr/bin/python /Users/zhaocai/.vim/bundle/YouCompleteMe/python/ycm/server/ycmd.py --port=64673 --options_file=/var/folders/w8/w9m8r3091wj_kk13dhdvj_nc0000gn/T/tmpwWO58f --log=error

(18.6 + 11.5 + 11.8)*16/100 = 6.7 GB

The ycmd server should not have multiple instances, am I right?

Valloric commented 10 years ago

The ycmd server should not have multiple instances, am I right?

There shouldn't be, correct. There was a regression yesterday that prevented ycmd from shutting down correctly; I fixed it last night. You probably ran a bad build. Sync and that issue should go away.

WRT memory consumption... I don't know. I'm surprised it's taking that much memory for non-C-family files. I'll look into it.

Valloric commented 10 years ago

Nope, I can't repro this for non-C-family files. No matter how many Python files I open, I can't get it to go above 100 MB usage.

zhaocai commented 10 years ago

I did some testing and identified the issue. The issue is not about C or non-C-family files. It is rooted from the tags identifier module. Once I set let g:ycm_collect_identifiers_from_tags_files = 0, the memory usage of ycmd server stabilizes to about 60MB.

Because I observe that the memory usage skyrocketed from 20MB to 2GB within a second once I open a python or vim file, I speculate that the module related to tag identifiers has memory leakage. I do not know if you use ctags to generate the tags or other internal code. In my computer, I do notice that the ctags I get from fishman/ctags generates a 1.5GB tags file from the YouCompleteMe repo - mainly from the Boost-Part.

Can you check it? Thanks.

zhaocai commented 10 years ago

Can you test with this tags file?

Valloric commented 10 years ago

Because I observe that the memory usage skyrocketed from 20MB to 2GB within a second once I open a python or vim file, I speculate that the module related to tag identifiers has memory leakage.

I think a more reasonable conclusion is that you have a very big tags file. Open the file that causes the RAM usage to skyrocket and then do :echo tagfiles(). Then examine those files.

I do not know if you use ctags to generate the tags or other internal code. In my computer, I do notice that the ctags I get from fishman/ctags generates a 1.5GB tags file from the YouCompleteMe repo - mainly from the Boost-Part.

If you're loading such a tags file, then that would be the reason for the RAM usage.

zhaocai commented 10 years ago

I have double checked. The only tags file I have is the one I just uploaded tags which is 27MB. It should not cause 2GB memory usage spike.

I mentioned the 1.5GB tags file just for your reference. It was an issue I identified long time ago and I have set those related files excluded.

Can you check with this tags file in your system?

zhaocai commented 10 years ago

and this tags (7.7MB) generated from the YouCompleteMe source code spikes the memory usage to 1.2GB.

Valloric commented 10 years ago

and this tags (7.7MB) generated from the YouCompleteMe source code spikes the memory usage to 1.2GB.

I can repro this. Looking into it.

zhaocai commented 10 years ago

Thanks. Just for your reference. I have disabled tag completer; now the ycmd use about 80MB memory in my system.

Valloric commented 10 years ago

I can repro this with master, so it's not a regression on the ycmd branch. This needs to be fixed, sure, but since it's not a regression I don't see it as a blocker for ycmd being merged into master.

zhaocai commented 10 years ago

any progress on this matter? (´・_・`)

Valloric commented 10 years ago

Not really, no. It's on the back burner, I have a ton of work.

zhaocai commented 10 years ago

I took a quick look of the code. The only thing I can think of is the boost::regex_search. It might be better to use line based iteration because the TAG_REGEX do start with ^.

Is it possible that the whole file_content ( 7 MB, for example) is repeated copied in the while loop without being release somehow?

zhaocai commented 10 years ago

the large amount of memory usage is normal

I have reached the conclusion that the large amount of memory usage is normal after I dumped and analyzed the ycmd trace.

Below are some observations and suggestions:

  1. The memory usage is mainly raised by adding identifiers from tag files. Follow the snapshot below; most of the memory allocations are from creating new Candidates. ycmd memory analysis

The reason for the large amount of memory usage is the size of each Candidate - 1.50 KB. I read the code and notice each letter is associated with a LetterNode. Even though, 1.50 KB is larger than expected. I am not sure there is a easy way to do something on this matter.


#     Address     Category     Timestamp     Live     Size     Responsible Library     Responsible Caller
461776     0x7feaf4e33600     Malloc 1.50 KB     00:52.684.957     •     1.50 KB     ycm_core.so     YouCompleteMe::Candidate::Candidate(std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > const&)
461777     0x7feaf4e33c00     Malloc 1.50 KB     00:52.684.959     •     1.50 KB     ycm_core.so     YouCompleteMe::LetterNode::LetterNode(std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > const&)
461778     0x7feaf4e34200     Malloc 1.50 KB     00:52.684.962     •     1.50 KB     ycm_core.so     YouCompleteMe::LetterNode::LetterNode(std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > const&)
461779     0x7feaf4e34800     Malloc 1.50 KB     00:52.684.964     •     1.50 KB     ycm_core.so     YouCompleteMe::LetterNode::LetterNode(std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > const&)
461780     0x7feaf4e34e00     Malloc 1.50 KB     00:52.684.966     •     1.50 KB     ycm_core.so     YouCompleteMe::LetterNode::LetterNode(std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > const&)
461781     0x7feaf4e35400     Malloc 1.50 KB     00:52.684.970     •     1.50 KB     ycm_core.so     YouCompleteMe::LetterNode::LetterNode(std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > const&)
461782     0x7feaf4e35a00     Malloc 1.50 KB     00:52.684.972     •     1.50 KB     ycm_core.so     YouCompleteMe::LetterNode::LetterNode(std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > const&)
461783     0x7feaf4e36000     Malloc 1.50 KB     00:52.684.974     •     1.50 KB     ycm_core.so     YouCompleteMe::LetterNode::LetterNode(std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > const&)
461784     0x7feaf4e36600     Malloc 1.50 KB     00:52.684.978     •     1.50 KB     ycm_core.so     YouCompleteMe::LetterNode::LetterNode(std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > const&)
zhaocai commented 10 years ago
  1. I am not sure about the purpose of LetterNode. My best guess is for fuzzy match. Since each letter will take 32 bytes, I think the memory footage of this data structure is quite large. It is possible to load it on demand? In the memory trace I dumped, nearly half of the memory was taken by LetterNode.
zhaocai commented 10 years ago

In short, ycmd works well without tag identifiers because it only deals with the files opened in vim ( should be less than 10 in common case ); with tag identifiers, it is like handling hundreds of files opened in vim at the same time.

Valloric commented 10 years ago

The memory usage is mainly raised by adding identifiers from tag files

That's actually what I assumed to be the case, which is why I didn't want to tackle this issue now.

Even though, 1.50 KB is larger than expected.

Agreed.

I am not sure there is a easy way to do something on this matter.

Agreed with that too.

I am not sure about the purpose of LetterNode. My best guess is for fuzzy match.

Yes, it's for fast subsequence matching over large identifier sets. The algorithm is a bit complicated, but I have mathematical proof that there's no algorithm with a better Big Oh complexity. Constant factors OTOH could probably be improved, but it's super fast as it is.

Eventually I'll take a look at what can be done to make those data-structures smaller.

zhaocai commented 10 years ago

Thanks.

Meanwhile, you can write a FAQ to suggestion users to exclude ctags to less related folders like the BoostParts, llvm, gmock, and third_party in the YouCompleteMe source.

eapache commented 10 years ago

I was looking at this and I believe that letternode_per_text_index_ can be removed from LetterNode? It is only used in the constructor, so just instantiate a local copy there? I see it's used with sharedptr right now - my boost is a bit rusty, but couldn't that just be moved to `letters` or something?

Thanatermesis commented 10 years ago

Im having the same problem, multiple ycmd instances after to close the files and HUGE memory usage

More exactly: 480 MB of ram for a 35.000 lines of ctags files (2.7 MB of ctags files)

(editing C source code files)

boulund commented 9 years ago

I just had the same issue with multiple ycmd instances after editing several files (Python only, but a rather big project) and closing down vim. Enormous memory usage (75GiB) exceeding my RAM (64GiB) dragged my system to mud-crawling speeds with all the swapping going on. After I killed the ycmd instances and started editing again I'm down to a more reasonable memory consumption (2.5GiB).

gergap commented 9 years ago

Hi, I had the same issue. The system was pretty much unusable since the YCM python process was eating up 4GB of RAM... I removed my hardcoded tag files from .vimrc to workaround this issue. Now the python2 process consumes 930MB of RAM with just one tagfile (14000 tags). This works but is not desirable. I digged into the problem and found this bug request. I also had a look into the code and was thinking about how this could be fixed.

But this is a design issue and cannot be easily bugfixed. So I prototyped an alternative solution. I've put a fuzzy matching algorithm that is damn fast and is very resource saving on github here: https://github.com/gergap/fuzzysearch

For getting small and fast code OOP code using smartpointers, etc. is an overkill IMO. For this reason I used C over C++ and designed the code to use small datastructures in arrays, thus avoiding lots of memory allocations, fragmented memory etc. Iterating over arrays of data is very fast, because it benefits from lots of cache hits...

My testapp only consumes 1MB RAM with the tagfile where the YCM needs 930MB. This is not a fair comparison, because YCM implements a lot of other stuff and includes python, nevertheless it consumes much much less RAM.

The algorithm works different than yours, so it is not limited to consecutive letters, which can be an advantage IMO, but this is very subjective. Just give it a try with my test application.

I would be glad if you could integreate this into YCM. Any feedback is welcome.

regards, Gerhard.

phcerdan commented 8 years ago

Same problem here, 9GB in one or two python2 process with a big tag file. Had to let g:ycm_collect_identifiers_from_tags_files = 0

micbou commented 8 years ago

PR #2225 brought a big reduction in memory usage for identifiers and in particular tags. While there is always room for improvement in this area, I think we can mark this issue as resolved.

ghost commented 7 years ago

same issue with me for single python file :( any solution ?? plz share

vheon commented 7 years ago

@sanjay555 well I guess it depends on the file... how big it is? where do you see the memory consumption? in the ycmd process or in another process? Maybe is jedi that require the ram.

ghost commented 7 years ago

@vheon this solved my problem :

let g:ycm_server_use_vim_stdout = 1 let g:ycm_server_log_level = 'debug' :+1: thank you

puremourning commented 7 years ago

That's.... surprising. I can't see any reason why that would reduce the RAM usage (other than from log file buffering, but TBH IIRC telling to use Vim stdout doesn't actually do much).

vheon commented 7 years ago

How is that possible? We removed the g:ycm_server_use_vim_stdout https://github.com/Valloric/YouCompleteMe/commit/186aa6b

vheon commented 7 years ago

@sanjay555 Are you sure about that? do you have a consistent use of RAM with a Python file (i.e. you tried something like 5-10 times) and you do not have it with those options?

ghost commented 7 years ago

@vheon yep that solved that issue , surprising

assslove commented 6 years ago

This problem has existed, how can i do ?