indexer unexpectedly killed on Linux

cross commented 5 years ago

Possibly related to #2798 , with many gigabytes across 12 repositories within my opengrok source dir, I am recently seeing the indexing job run for about 45 minutes, then die.

Console shows:

11:08:10 INFO: Scanning for repositories...
11:08:11 INFO: Done scanning for repositories, found 1 repositories (took 825 ms)
11:08:11 INFO: Generating history cache for all repositories ...
11:08:11 INFO: Creating historycache for 1 repositories
11:08:11 INFO: Creating historycache for /disk/src (PerforceRepository) without renamed file handling
11:41:42 INFO: Done historycache for /disk/src (took 0:33:31)
11:41:42 INFO: Done historycache for all repositories (took 0:33:31)
11:41:42 INFO: Done...
11:41:42 INFO: Starting indexing
11:41:43 INFO: Waiting for the executors to finish
11:41:43 INFO: Creating historycache for /disk/src (PerforceRepository) without renamed file handling
11:41:43 INFO: Creating historycache for /disk/src (PerforceRepository) without renamed file handling
11:41:43 INFO: Creating historycache for /disk/src (PerforceRepository) without renamed file handling
11:41:43 INFO: Creating historycache for /disk/src (PerforceRepository) without renamed file handling
Killed

note, there was about 15 minutes between the last timestamped log line and the Killed and the process exiting.

I am reporting this now, because while trying to investigate, I pruned my source tree down to 6 repositories/projects. Now, the output is much different, suggesting the problem will not occur:

13:07:38 INFO: Scanning for repositories...
13:07:39 INFO: Done scanning for repositories, found 1 repositories (took 828 ms)
13:07:39 INFO: Generating history cache for all repositories ...
13:07:39 INFO: Creating historycache for 1 repositories
13:07:39 INFO: Creating historycache for /disk/src (PerforceRepository) without renamed file handling
13:07:49 INFO: Done historycache for /disk/src (took 10.276 seconds)  
13:07:49 INFO: Done historycache for all repositories (took 10.346 seconds)
13:07:49 INFO: Done...
13:07:49 INFO: Starting indexing
13:07:51 INFO: Waiting for the executors to finish
13:07:54 INFO: Creating historycache for /disk/src (PerforceRepository) without renamed file handling
13:07:54 INFO: Creating historycache for /disk/src (PerforceRepository) without renamed file handling
13:07:54 INFO: Creating historycache for /disk/src (PerforceRepository) without renamed file handling
13:07:54 INFO: Creating historycache for /disk/src (PerforceRepository) without renamed file handling
13:08:18 INFO: Done historycache for /disk/src (took 23.919 seconds)  
13:08:18 INFO: Starting traversal of directory /projectA
13:08:18 INFO: Starting indexing of directory /projectA
13:08:18 INFO: Done historycache for /disk/src (took 24.582 seconds)
13:08:18 INFO: Starting traversal of directory /projectL
13:08:19 INFO: Done historycache for /disk/src (took 24.824 seconds)
13:08:19 INFO: Starting traversal of directory /projectD
13:08:19 WARNING: Error from ctags: ctags: Warning: Language "clojure" already defined
13:08:19 WARNING: Error from ctags: ctags: Warning: Language "rust" already defined
13:08:19 WARNING: Error from ctags: ctags: Warning: Language "pascal" already defined
13:08:19 INFO: Done historycache for /disk/src (took 25.638 seconds)
13:08:19 INFO: Starting traversal of directory /projectF
13:08:20 INFO: Starting indexing of directory /projectF
...

Is there just an issue where I have too many repositories? It is disk allocation, or memory?

I have 16GB in this system, and am running with -Xms4g -Xmx14g. I have at points in the past gotten errors about running out of memory, but am not seeing any of those recently. Is there any way to tell what is killing these jobs?

(I'm running on a virtual x86_64 system, Ubuntu 18.04)

vladak commented 5 years ago

Indexer never kills itself, that would be just funny. Sounds like Linux OOM killer to me. https://dev.to/rrampage/surviving-the-linux-oom-killer-2ki9 gets the top hits in the search engine I am using. If you want to be sure who killed the process, there are some options: https://stackoverflow.com/questions/726690/what-killed-my-process-and-why but it seems dtrace, System tap or the like would get the answer right away.

cross commented 5 years ago

Thanks. I knew OOM killer might be involved, but didn't know how to trace it. I see now that that's what was happening. Thank you, we can close this. Cause found. 👍

oracle / opengrok

indexer unexpectedly killed on Linux #2799