oracle / opengrok

OpenGrok is a fast and usable source code search and cross reference engine, written in Java
http://oracle.github.io/opengrok/
Other
4.34k stars 745 forks source link

Indexing take much time approximately 12000min. ver 0.11 #822

Closed jiungbaek closed 5 years ago

jiungbaek commented 10 years ago

Ubuntu 12.04.2 LTS 32core 80G Memory src size : 60~70G android java 1.7.0_51 python 2.7.3

$ ctags --version Exuberant Ctags 5.9~svn20110310 git : 1.8.3.4

in Opengrok file,

OPENGROK_VERBOSE="true" OPENGROK_SCAN_DEPTH="99"

JAVA_OPTS=" ${JAVA_OPTS:--Xms4g -Xmx25g -XX:NewRatio=3 -XX:PermSize=2g -XX:MaxPermSize=2g -XX:+UseParNewGC -XX:+CMSParallelRemarkEnabled -XX:SurvivorRatio=10 -XX:TargetSurvivorRatio=90 -Dnet.sf.ehcache.skipUpdateCheck=true -XX:+DoEscapeAnalysis -XX:+UseConcMarkSweepGC -XX:-TraceClassUnloading -XX:CMSInitiatingOccupancyFraction=60 -XX:+UseCMSInitiatingOccupancyOnly -XX:+PrintGCDetails -XX:+PrintGCTimeStamps -verbose:gc -XX:ParallelGCThreads=32 -XX:MaxTenuringThreshold=0}

"

Usually indexing process use 10~20G physical memory even i set virtual memory to 50G


Hi. i use opengrok for android source search.

i have some problem after KK version.

usually, first indexing, It takes approximately 1000min ~ 1200min.

However, nowadays new source and indexing take 12000min or more.

I tested other servers(specs are little bit difference), indexing time is the same.

something wierd. Would you guys please provide some help.

If you guys need more information, please let me know.

Thank you.

jiungbaek commented 10 years ago

I found the reason of this problem.

few git repository has much commit (such as 150000 commits)

so, i want to use git log (git log -n 1000 ) like this.

Does any way to use in OpenGrok file?

only fix source and rebuild??

cyberplant commented 10 years ago

I think you can create a git script that uses your git log command when asks git for log and tell opengrok to use that script instead the git binary. I didn't test it because I'm on mobile, but it should work.

Also if you don't care about history, you can disable it completely.

vladak commented 10 years ago

Problem wise this is identical to #802 however the idea of using only last n commits is interesting.

tarzanek commented 10 years ago

@jiungbaek can you get to 0.12? your JVM hacks should not be needed anymore + there is some tuning you can do as in #802 (see docs - readme)

vladak commented 10 years ago

or better to 0.12.1

naseer commented 9 years ago

Did forcing git log -n 1000 work for anyone ? I want to try this out.

tulinkry commented 5 years ago

Version 1.1.2 uses different approach for parallelization, thus the index time should be much shorter. I suggest upgrading. A candidate for closing.