Open vladak opened 11 years ago
attachment 618:
D:\grok\opengrok>java -Xms1024m -Xmx1024m -jar "D:\grok\opengrok\opengrok.jar" -T 20 -H -m 150000 -v -c "D:\grok\ctags57\ctags.exe" -P -p /2.0.01 -Q off -S -w search -i .cvsignore -i CVSROOT -i .svn -i jars -i package -i classes -i TeamNews4Setup -i Codecs -i .0 -i .000 -i .001 -i .002 -i .003 -i .004 -i .1 -i .3 -i .a -i .a1 -i .a2 -i .abc -i .ac -i .aco -i .alias -i .am -i .aps -i .asa -i .asc -i .asx -i .avi -i .ax -i .bas -i .bin -i .bmp -i .bmp_ -i .bpi -i .bpl -i .cab -i .cd -i .cdx -i .cer -i .cfg -i .cgt -i .charset -i .chm -i .class -i .classpath -i .cls -i .clw -i .clx -i .cmap -i .cnt -i .conf -i .config -i .configure -i .cset -i .csproj -i .cth -i .cup -i .cur -i .cw9prj -i .cwl -i .darwin -i .dat -i .datasource -i .db -i .dbf -i .dcp -i .dcr -i .dcu -i .def -i .dep -i .dev -i .dfm -i .dic -i .dict -i .dir -i .diz -i .djgpp -i .dll -i .doc -i .docs -i .dof -i .doxygen -i .dpk -i .dpl -i .dpr -i .ds -i .dsstore -i .dsk -i .dsm -i *.dsp -i .dsw -i .dtd -i .dti -i .dv -i .emf -i .erd -i .exe -i .fcs -i .fil -i .files -i .fla -i .form -i .fpt -i .frm -i .frx -i .gid -i .gif -i .global -i .grf -i .guess -i .gxf -i .hfd -i .hlp -i .hm -i .howto -i .hta -i .ico -i .idb -i .ids -i .idx -i .imf -i .iml -i .inc -i .inf -i .inl -i .inst -i .ipr -i .irl -i .isr -i .isu -i .isv -i .iws -i .iwz -i .jar -i .jnlp -i .jpeg -i .jpg -i .jplugin -i .jws -i .kbd -i .lai -i .lex -i .li_ -i .lib -i .lic -i .linux -i .lnk -i .lst -i .m -i .m0 -i .m2v -i .m4 -i .m_index -i .manifest -i .map -i .mat -i .mc -i .mcp -i .mdl -i .me -i .men -i .mf -i .mft -i .mingw32 -i .mk -i .mlet -i .mo -i .mod -i .mp2 -i .mp3 -i .mpg -i .msvc -i .multi -i .nas -i .ncb -i .noi -i .nsi -i .obj -i .oca -i .ocx -i .odl -i .odt -i .old -i .opt -i .original -i .os2 -i .output -i .pat -i .pbxproj -i .pch -i .pch++ -i .pcx -i .pdf -i .pfx -i .pgr -i .plc -i .plg -i .plist -i .png -i .policy -i .positions -i .ppt -i .properties -i .prx -i .ps -i .psd -i .pwli -i .pws -i .py -i .r -i .ra -i .ram -i .ras -i .rds -i .rec -i .renamed -i .rep -i .res -i .resorg -i .resx -i .rgs -i .rpt -i .rsu -i .rtf -i .rws -i .sample -i .scc -i .scm -i .settings -i .sin -i .skl -i .sln -i .sm -i .snd -i .spec -i .static -i .sts -i .sub -i .suo -i .swf -i .sys -i .t3 -i .tcd -i .tcs -i .template -i .tga -i .tif -i .tlb -i .tlx -i .tmpl -i .tmstmp -i .tpl -i .tre -i .trg -i .ttf -i .tth -i .ttk -i .unix -i .vbg -i .vbp -i .vbw -i .vcproj -i .ver -i .vsd -i .vspscc -i .vssscc -i .vup -i .vws -i .wav -i .wbmp -i .wingtk -i .wma -i .wmf -i .wml -i .wmlt -i .wmv -i .woe32 -i .wri -i .wsm -i .wsp -i .xls -i .xsd -i .xtc -i .y -i .zip -i .~df -i .~dp -i .~h -i .~pa -i .0 -i .000 -i .001 -i .002 -i .003 -i .004 -i .1 -i .3 -i .A -i .A1 -i .A2 -i .ABC -i .AC -i .ACO -i .ALIAS -i .AM -i .APS -i .ASA -i .ASC -i .ASX -i .AVI -i .AX -i .BAS -i .BIN -i .BMP -i .BMP_ -i .BPI -i .BPL -i .CAB -i .CD -i .CDX -i .CER -i .CFG -i .CGT -i .CHARset -i .CHM -i .CLASS -i .CLASSPATH -i .CLS -i .CLW -i .CLX -i .CMAP -i .CNT -i .CONF -i .CONFIG -i .CONFIGURE -i .Cset -i .CSPROJ -i .CTH -i .CUP -i .CUR -i .CW9PRJ -i .CWL -i .DARWIN -i .DAT -i .DATASOURCE -i .DB -i .DBF -i .DCP -i .DCR -i .DCU -i .DEF -i .DEP -i .DEV -i .DFM -i .DIC -i .DICT -i .DIR -i .DIZ -i .DJGPP -i .DLL -i .DOC -i .DOCS -i .DOF -i .DOXYGEN -i .DPK -i .DPL -i .DPR -i .DS -i .DS_STORE -i .DSK -i .DSM -i .DSP -i .DSW -i .DTD -i .DTI -i .DV -i .EMF -i .ERD -i .EXE -i .FCS -i .FIL -i .FILES -i .FLA -i .FORM -i .FPT -i .FRM -i .FRX -i .GID -i .GIF -i .GLOBAL -i .GRF -i .GUESS -i .GXF -i .HFD -i .HLP -i .HM -i .HOWTO -i .HTA -i .ICO -i .IDB -i .IDS -i .IDX -i .IMF -i .IML -i .INC -i .INF -i .INL -i .INST -i .IPR -i .IRL -i .ISR -i .ISU -i .ISV -i .IWS -i .IWZ -i .JAR -i .JNLP -i .JPEG -i .JPG -i .JPLUGIN -i .JWS -i .KBD -i .LAI -i .LEX -i *.LI -i .LIB -i .LIC -i .LINUX -i .LNK -i .LST -i .M -i .M0 -i .M2V -i .M4 -i .M_INDEX -i .MANIFEST -i .MAP -i .MAT -i .MC -i .MCP -i .MDL -i .ME -i .MEN -i .MF -i .MFT -i .MINGW32 -i .MK -i .MLET -i .MO -i .MOD -i .MP2 -i .MP3 -i .MPG -i .MSVC -i .MULTI -i .NAS -i .NCB -i .NOI -i .NSI -i .OBJ -i .OCA -i .OCX -i .ODL -i .ODT -i .OLD -i .OPT -i .ORIGINAL -i .OS2 -i .OUTPUT -i .PAT -i .PBXPROJ -i .PCH -i .PCH++ -i .PCX -i .PDF -i .PFX -i .PGR -i .PLC -i .PLG -i .PLIST -i .PNG -i .POLICY -i .POSITIONS -i .PPT -i .PROPERTIES -i .PRX -i .PS -i .PSD -i .PWLI -i .PWS -i .PY -i .R -i .RA -i .RAM -i .RAS -i .RDS -i .REC -i .RENAMED -i .REP -i .RES -i .RESORG -i .RESX -i .RGS -i .RPT -i .RSU -i .RTF -i .RWS -i .SAMPLE -i .SCC -i .SCM -i .SETTINGS -i .SIN -i .SKL -i .SLN -i .SM -i .SND -i .SPEC -i .STATIC -i .STS -i .SUB -i .SUO -i .SWF -i .SYS -i .T3 -i .TCD -i .TCS -i .TEMPLATE -i .TGA -i .TIF -i .TLB -i .TLX -i .TMPL -i .TMSTMP -i .TPL -i .TRE -i .TRG -i .TTF -i .TTH -i .TTK -i .UNIX -i .VBG -i .VBP -i .VBW -i .VCPROJ -i .VER -i .VSD -i .VSPSCC -i .VSSSCC -i .VUP -i .VWS -i .WAV -i .WBMP -i .WINGTK -i .WMA -i .WMF -i .WML -i .WMLT -i .WMV -i .WOE32 -i .WRI -i .WSM -i .WSP -i .XLS -i .XSD -i .XTC -i .Y -i .ZIP -i .~DF -i .~DP -i .~H -i .~PA -a on -s "D:\grok\working-copies" -d "D:\grok\index"
Scanning for repositories...
Oct 22, 2008 5:04:33 PM org.opensolaris.opengrok.history.HistoryGuru addRepositories
INFO: Adding
Hi, lately I was asked by my boss to explore OpenGrok possibilities in the company I'm working for. First I started with a few projects at my virtualbox lubuntu, it was working ok, but kind of slowly. I blamed my laptop with mediocre parameters for that.
Now I'm having virtual of bigger proportions and I'm also running indexing on larger volume of data (SVN repository - 100 different projects, some of them with multiple branches, tags and trunk, about 100 000 files in total, few GB in size). All files are checked out directly in the SRC_ROOT.
I was hoping for reasonably fast indexing, but it's been running for more than five days now. I can see multiple threads running via htop, but CPU usage is 0.5-2.5%, memory usage 0.9%. So I guess it's not an issue of computing power. And unless there are terribly slow HDDs I don't know what the problem is.
Furthemore the indexing process seems to be slowing down. At the beginning it was approximately 1 sec/file, now it is about 5 sec/file. Unfortunately I haven't triggered the progress option, so I have no idea how long it's still going to run.
Any ideas how to make indexing faster? How to use resources more effectively? Current speed is simply unusable...
@Draczech I think the reason is slow svn disable history search I'd say - or try to use javadb history cache backend, which can do incremental index (so won't poll for full history every time you index) SCM systems such as cvs, svn, sccs are simply slow by default - opengrok when building index queries usually for whole history distributed systems such as hg or git will give you a big booster - try converting one of your svn repos to git/hg and you'll see the difference
of course if -H didn't help you ... the other option is to have a look how to improve svn indexing in OpenGrok - your call, code is easy to understand (for me personally svn is dead, so I don't think it's feasible to waste time on it ... sorry )
Well SVN is not my choice, it is used by the company I'm working for... And history index is considered one of the benefits of using OpenGrok. So I can't really just turn it off like that. On the other hand once indexing is done, incremental index next time is much faster. So it only takes large amount of time for the initial index phase. Thus making OpenGrok still usable. Thanks anyway for help.
Well then the only advice I can give is to look at svn and its commands and figure out if we can optimize those calls somehow
which version is this ? OpenGrok supports incremental history index for file based history cache since 0.12.
Another workaround would be to add bunch of projects each time the indexer is run until all of them are indexed.
If -H is used then the indexer process runs basically svn log --xml
for all repos and then parses the output to create inverted map (so that for each file it has list of changesets where the file changed). If -H is not specified then the indexer needs a list of repositories to create the inverted map otherwise it will skip this step and will proceed to creating xrefs for all files. But, in order to populate the Lucene document it needs history for given file so for every single file it fetches it via svn log file
which is much slower than getting the history for the whole repository in one go as the overhead for spawning is much bigger.
It would be really nice if you can drill down to see what is causing the delays. For instance, try strace
-ing the indexing process (and its children like the svn
command) to see what it is doing in terms of syscalls. How fast are the svn log
commands for the repos when run standalone ? What about network traffic, does it progress normally or are there lags ?
I faced the same issue. Each svn log
command used by OpenGrok to index each file was taking more than one minute to execute. When I tried the same command on the SVN Server using file://
URLs instead of https://
URLs they took less than a second!
To make svn log
super-fast I inserted the following directive on Apache's httpd configuration:
SVNPathAuthz off
After that the svn log
executed remotely via https://
started to take less than a second too!
Disabling this option has some security implications, so it's important to understand them.
For me the performance improvement more than offsets the security issues, but YMMV. Perhaps you could try to disable it only during the initial repository indexing and reenable it afterwards, since subsequent reindex operations are usually much faster anyway.
I hope it helps.
status REOPENED severity major in component indexer for --- Reported in version unspecified on platform Other Assigned to: Trond Norbye
Original attachment names and IDs:
On 2008-04-11 15:47:15 +0000, Moisei wrote:
On 2008-05-29 13:25:41 +0000, Cory Remick wrote:
On 2008-08-21 13:15:31 +0000, Trond Norbye wrote:
On 2008-10-22 14:36:44 +0000, Moisei wrote:
On 2008-10-22 14:38:08 +0000, Moisei wrote: