Open vladak opened 9 years ago
According to @tarzanek the file itself is invalid (http://www.imagemagick.org/discourse-server/viewtopic.php?t=22851 says Links in www/api Documentation bloated with 16384 "../"
) however the analyzer should not be that sensitive.
The workaround is to add this file to the ignored list (the -i
option or IGNORE_PATTERNS
environment variable used by the OpenGrok
script).
For completeness the indexer was running with:
ncpus=`/sbin/psrinfo | grep on-line | wc -l`
nthr=`expr $ncpus \* 2`
# more efficient in case multiple projects wait on renamed files processing
# see https://github.com/OpenGrok/OpenGrok/pull/752 for details
tunables="-Dorg.opensolaris.opengrok.history.NumCacheRenamedThreads=$nthr"
tunables="$tunables -Dorg.opensolaris.opengrok.history.RenamedHandlingEnabled=1"
# for userland prepped repositories
tunables="$tunables -Dorg.opensolaris.opengrok.history.noFetchWhenNotInCache=1"
# https://github.com/OpenGrok/OpenGrok/issues/718
JAVA_OPTS="-d64 -XX:-UseGCOverheadLimit -Xmx8192m -server $tunables"
# speed up indexing by tuning Lucene memory buffer size
OPENGROK_FLUSH_RAM_BUFFER_SIZE="-m 256"
so this is actually a compact bug and reason why we don't see this in other analyzers is: JAVA:
File = [a-zA-Z]{FNameChar}\* "." ("java"|"properties"|"props"|"xml"|"conf"|"txt"|"htm"|"html"|"ini"|"jnlp"|"jad"|"diff"|"patch")
Path = "/"? [a-zA-Z]{FNameChar}\* ("/" [a-zA-Z]{FNameChar}*[a-zA-Z0-9])+
XML :
File = {FNameChar}+ "." ([a-zA-Z]+) {FNameChar}*
Path = "/"? {FNameChar}+ ("/" {FNameChar}+)+[a-zA-Z0-9]
SH:
Path = "/"? [a-zA-Z]{FNameChar}\* ("/" [a-zA-Z]{FNameChar}*)+[a-zA-Z0-9]
so all analyzers have broken path detection if it would work, we could get better links where appropriate
Test on paths e.g. :
../../../java.bah
../ffaa/foobar
../foobar.i
funny enough, we would be then able to see why #806 was not fixed fully, since that code never worked and when compact is true, output would be eaten if prefixed by "../"
so breadcrumbPath needs some serious fixes and tests with deeper paths (no matter whether true or false is used ... )
Also just to be on safe side - Integer.MAX_VALUE - 5 is the max size of an array, so we should probably check input data if it won't exceed this input (though after the breadcrumbPath(Util.java:278) is fixed I doubt we will hit the limit ever (or ... once OSes will support LOOOONG paths with zilions of subdirs) )
I guess such files will have to be ignored for now
Running indexed on Solaris Userland consolidation ended prematurely with:
The file in question is 48MB XHTML file which looks like this: