oracle / opengrok

OpenGrok is a fast and usable source code search and cross reference engine, written in Java
http://oracle.github.io/opengrok/
Other
4.33k stars 746 forks source link

error message while parsing the xml output, indexing of a big repository #1172

Open cosmoJFH opened 8 years ago

cosmoJFH commented 8 years ago

I am indexing a big repository (400 Gb) and I have received the following error. The indexing process continues after this message has been thrown. Do you know if this message is relevant and how does it affect the final results?

This is the command I have used in order to index the repo:

JAVA_HOME=/software/jdk1.8.0_60/ JAVA_OPTS="-d64 -Xms2024m -Xmx6096m -server -XX:-UseGCOverheadLimit" OPENGROK_FLUSH_RAM_BUFFER_SIZE="-m 256" EXUBERANT_CTAGS=/usr/local/bin/ctags OPENGROK_APP_SERVER="Tomcat" OPENGROK_TOMCAT_BASE=/home/svnsearch/tomcat_svnsearch_test OPENGROK_WEBAPP_CONTEXT="dpacsvn" OPENGROK_SCAN_DEPTH=20 READ_XML_CONFIGURATION=$OpenGrok/readonly_configuration.xml OPENGROK_LOGGER_CONFIG_PATH=$OpenGrok/logging.properties OPENGROK_SUBVERSION_USERNAME=$userName OPENGROK_SUBVERSION_PASSWORD=$pass OPENGROK_DISTRIBUTION_BASE=$OpenGrok/dist OPENGROK_INSTANCE_BASE=/svn-search/data_opengrok/ IGNORE_PATTERNS="-i .dat -i .gbin -i .fits -i .fit -i .png -i .jpg -i .jpeg -i .gif -i d:.svn" $OpenGrok/OpenGrok index $sourceDir

This is the error:

08:01:03 SEVERE: An error occurred while parsing the xml output org.xml.sax.SAXParseException; lineNumber: 424045; columnNumber: 1; XML document structures must start and end within the same entity. at com.sun.org.apache.xerces.internal.util.ErrorHandlerWrapper.createSAXParseException(ErrorHandlerWrapper.java:203) at com.sun.org.apache.xerces.internal.util.ErrorHandlerWrapper.fatalError(ErrorHandlerWrapper.java:177) at com.sun.org.apache.xerces.internal.impl.XMLErrorReporter.reportError(XMLErrorReporter.java:441) at com.sun.org.apache.xerces.internal.impl.XMLErrorReporter.reportError(XMLErrorReporter.java:368) at com.sun.org.apache.xerces.internal.impl.XMLScanner.reportFatalError(XMLScanner.java:1437) at com.sun.org.apache.xerces.internal.impl.XMLDocumentFragmentScannerImpl.endEntity(XMLDocumentFragmentScannerImpl.java:903) at com.sun.org.apache.xerces.internal.impl.XMLDocumentScannerImpl.endEntity(XMLDocumentScannerImpl.java:563) at com.sun.org.apache.xerces.internal.impl.XMLEntityManager.endEntity(XMLEntityManager.java:1394) at com.sun.org.apache.xerces.internal.impl.XMLEntityScanner.load(XMLEntityScanner.java:1764) at com.sun.org.apache.xerces.internal.impl.XMLEntityScanner.skipChar(XMLEntityScanner.java:1413) at com.sun.org.apache.xerces.internal.impl.XMLDocumentFragmentScannerImpl$FragmentContentDriver.next(XMLDocumentFragmentScannerImpl.java:2823) at com.sun.org.apache.xerces.internal.impl.XMLDocumentScannerImpl.next(XMLDocumentScannerImpl.java:606) at com.sun.org.apache.xerces.internal.impl.XMLDocumentFragmentScannerImpl.scanDocument(XMLDocumentFragmentScannerImpl.java:510) at com.sun.org.apache.xerces.internal.parsers.XML11Configuration.parse(XML11Configuration.java:848) at com.sun.org.apache.xerces.internal.parsers.XML11Configuration.parse(XML11Configuration.java:777) at com.sun.org.apache.xerces.internal.parsers.XMLParser.parse(XMLParser.java:141) at com.sun.org.apache.xerces.internal.parsers.AbstractSAXParser.parse(AbstractSAXParser.java:1213) at com.sun.org.apache.xerces.internal.jaxp.SAXParserImpl$JAXPSAXParser.parse(SAXParserImpl.java:649) at com.sun.org.apache.xerces.internal.jaxp.SAXParserImpl.parse(SAXParserImpl.java:333) at javax.xml.parsers.SAXParser.parse(SAXParser.java:195) at org.opensolaris.opengrok.history.SubversionHistoryParser.processStream(SubversionHistoryParser.java:189) at org.opensolaris.opengrok.util.Executor.exec(Executor.java:212) at org.opensolaris.opengrok.history.SubversionHistoryParser.parse(SubversionHistoryParser.java:161) at org.opensolaris.opengrok.history.SubversionRepository.getHistory(SubversionRepository.java:251) at org.opensolaris.opengrok.history.Repository.createCache(Repository.java:306) at org.opensolaris.opengrok.history.HistoryGuru.createCache(HistoryGuru.java:509) at org.opensolaris.opengrok.history.HistoryGuru.access$000(HistoryGuru.java:54) at org.opensolaris.opengrok.history.HistoryGuru$1.run(HistoryGuru.java:560) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) at java.util.concurrent.FutureTask.run(FutureTask.java:266) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) at java.lang.Thread.run(Thread.java:745)

tarzanek commented 8 years ago

it's a bug of svn parser, can the svn repo which causes this be cloned publicly so we can try to reproduce?

cosmoJFH commented 8 years ago

Unfortunately, I am making use of a private repository. If you give me any guidance I will try to fix it by myself.

cosmoJFH commented 7 years ago

I think the issue whas related to the memory. I changed the following parameters:

  1. JAVA_OPTS="-d64 -Xms3072m -Xmx6500m -Xss512k -server -XX:-UseGCOverheadLimit"
  2. OPENGROK_FLUSH_RAM_BUFFER_SIZE="-m 128"
  3. timeout=0

I indexed the repo from scratch, and did not get the exception.

tarzanek commented 7 years ago

it could very well be, so we have a workaround now, good.