smontanari / code-forensics

A toolset for code analysis and report visualisation
384 stars 45 forks source link

Large Codebase Hotspot Analysis breaks with ENOMEM: not enough memory #19

Closed jeskoriesner closed 5 years ago

jeskoriesner commented 6 years ago

I am trying to analyse a large code base with 100k+ revisions. The process gets interrupted with the not enough memory error. I looked around and tried to solve the error by providing a higher memory limit to node using the --max-old-space-size but is does not matter. The process always terminates at the same location. I also tried to analyse *.java files only, but the process still terminates. I could allocate almost 20GB of mermory to the process but I am not sure how. Any suggestions?

I have attached the console log for reference analysis.zip

smontanari commented 6 years ago

If the memory issue was due to the large number of revisions my suggestion would be to try not to analyse the entire history of the codebase in one go. However looking at the log file it seems like the task that's failing is not the vcs-log-dump, rather the sloc-report, which seems to run out of memory after reading about 6600 files. It could be that nodejs cannot garbage collect in time all the stream objects that it creates to analyse through sloc. Having said that, if you can actually allocate 20GB of memory to the process it would mean that the size in byte of your codebase would have to be many GB of data, which is hard to believe especially for only 6600 files (unless every file is 3+ MB of content). If you're analysing a public repo I can give it a go and try and dig deeper, otherwise it's going to be hard to replicate the issue. Alternatively, if you could spend some time on this I would try and use some memory profiling tool to understand if it's a genuine lack of memory problem or more likely a memory leak or a bug in nodejs itself. As far as code-forensics is concerned the code that runs the sloc report is pretty straightforward, it opens the file and streams its content to the sloc function, and garbage collection should take care of cleaning up all the memory used to perform that task.

smontanari commented 6 years ago

@jeskoriesner have you tried to just run (from your project root):

$ node --max-old-space-size=<mem size> ./node_modules/.bin/gulp...

If that helps please let me know and I will add this to the troubleshooting documentation.

jeskoriesner commented 6 years ago

@smontanari have used the --max-old-space-size= option, but the outcome did not change, no matter how much memory I did allocate. So, I do not think the memory is really allocated to the process that is failing. It is a company repo which I cannot share. The files are not extraordinarily large either, so I cannot make sense of what is happening. I was wondering if a sub-process was unable to get the allocated memory?

smontanari commented 6 years ago

@jeskoriesner all I can recommend is to break down the analysis steps and and try to isolate the problem. In your case the hotspot-analysis requires the following tasks to execute prior to itself: vcs-log-dump, sloc-report, revision-report. Try and execute them individually (maybe with the debug flag turned on) and see if any one in particular triggers the issue. If not, then maybe the hotspot-analysis task itself could be where things break. This is as much as I can suggest to help.

smontanari commented 5 years ago

The latest release of code-forensics should help in limiting the number of Event Emitters created at any point in time, which is known to possibly cause memory leaks in the case of parallel execution of a large number of tasks. I'd be curious to know whether it would also help you @jeskoriesner with your particular issue.

smontanari commented 5 years ago

Closing this issue for now, not sure if it's still a problem for anyone