Open GoogleCodeExporter opened 9 years ago
Updated the POM to set the default memory limit to 512MB, up from 256MB. I'll
monitor this for a few weeks to see if this improves things.
Original comment by trevor.s...@gmail.com
on 30 Jan 2009 at 3:08
Original comment by trevor.s...@gmail.com
on 30 Jan 2009 at 3:08
re-created similar error with 1G memory allocated to the JVM (-Xmx1g -Xms1g),
running 'galago make-corpus'
The collection is approximately 16G of (uncompressed) HTML pages, and Galago
created ~ 15G of data in temporary files.
using java version 1.6, galago version 1.01, on quad-core linux box (Fedora
core 7) with 2G ram
Exception below:
$ galago make-corpus corpus input/
2009-02-03 13:54:04.970::INFO: Logging to STDERR via org.mortbay.log.StdErrLog
2009-02-03 13:54:04.973::INFO: jetty-6.1.5
2009-02-03 13:54:04.036::INFO: Started SocketConnector@0.0.0.0:34653
Status: http://localhost:34653
Exception in thread "pool-1-thread-1" java.lang.OutOfMemoryError: Java heap
space
at org.galagosearch.tupleflow.BufferedFileDataStream.cache(BufferedFileDataStream.java:223)
at org.galagosearch.tupleflow.BufferedFileDataStream.readFully(BufferedFileDataStream.java:96)
at org.galagosearch.tupleflow.VByteInput.readFully(VByteInput.java:23)
at org.galagosearch.tupleflow.ArrayInput.readBytes(ArrayInput.java:74)
at org.galagosearch.core.types.KeyValuePair$KeyOrder$ShreddedReader.fill(KeyValuePair.java:506)
at org.galagosearch.core.types.KeyValuePair$KeyOrder$ShreddedCombiner.initialize(KeyValuePair.java:368)
at org.galagosearch.core.types.KeyValuePair$KeyOrder$ShreddedCombiner.run(KeyValuePair.java:378)
at org.galagosearch.tupleflow.OrderedCombiner.run(OrderedCombiner.java:141)
at org.galagosearch.tupleflow.Sorter.combineStep(Sorter.java:508)
at org.galagosearch.tupleflow.Sorter.combine(Sorter.java:488)
at org.galagosearch.tupleflow.Sorter.close(Sorter.java:290)
at org.galagosearch.tupleflow.StandardStep.close(StandardStep.java:22)
at org.galagosearch.tupleflow.StandardStep.close(StandardStep.java:22)
at org.galagosearch.core.parse.DocumentSource.run(DocumentSource.java:160)
at org.galagosearch.tupleflow.execution.ThreadedStageExecutor$InstanceRunnable.run(ThreadedStageExecutor.java:57)
at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
at java.lang.Thread.run(Thread.java:619)
Exception in thread "main" java.util.concurrent.ExecutionException: Stage threw
an exception:
at org.galagosearch.tupleflow.execution.JobExecutor$JobExecutionStatus.waitForStages(JobExecutor.java:1135)
at org.galagosearch.tupleflow.execution.JobExecutor$JobExecutionStatus.run(JobExecutor.java:1073)
at org.galagosearch.tupleflow.execution.JobExecutor.runWithServer(JobExecutor.java:1191)
at org.galagosearch.tupleflow.execution.JobExecutor.runLocally(JobExecutor.java:1215)
at org.galagosearch.core.tools.App.handleMakeCorpus(App.java:210)
at org.galagosearch.core.tools.App.main(App.java:434)
Caused by: java.io.EOFException: Tried to read off the end of the file.
at org.galagosearch.tupleflow.BufferedFileDataStream.cache(BufferedFileDataStream.java:216)
at org.galagosearch.tupleflow.BufferedFileDataStream.readUnsignedByte(BufferedFileDataStream.java:199)
at org.galagosearch.tupleflow.VByteInput.readInt(VByteInput.java:73)
at org.galagosearch.tupleflow.ArrayInput.readInt(ArrayInput.java:19)
at org.galagosearch.tupleflow.ArrayInput.readBytes(ArrayInput.java:72)
at org.galagosearch.tupleflow.ArrayInput.readString(ArrayInput.java:124)
at org.galagosearch.tupleflow.FileOrderedReader.<init>(FileOrderedReader.java:34)
at org.galagosearch.tupleflow.FileOrderedReader.<init>(FileOrderedReader.java:58)
at org.galagosearch.tupleflow.FileOrderedReader.<init>(FileOrderedReader.java:72)
at org.galagosearch.tupleflow.execution.StageInstanceFactory.getTypeReaderSource(StageInstanceFactory.java:215)
at org.galagosearch.tupleflow.execution.StageInstanceFactory.instantiateInput(StageInstanceFactory.java:158)
at org.galagosearch.tupleflow.execution.StageInstanceFactory.instantiate(StageInstanceFactory.java:96)
at org.galagosearch.tupleflow.execution.StageInstanceFactory.instantiate(StageInstanceFactory.java:80)
at org.galagosearch.tupleflow.execution.ThreadedStageExecutor$InstanceRunnable.run(ThreadedStageExecutor.java:56)
at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
at java.lang.Thread.run(Thread.java:619)
Original comment by jel...@gmail.com
on 3 Feb 2009 at 7:47
I had a similar problem even with -Xmx2048m when attempting to index the 2GB
WT2G
(http://ir.dcs.gla.ac.uk/test_collections/access_to_data.html) collection, on
Ubuntu
Linux 8.10
Exception in thread "pool-1-thread-5" java.lang.OutOfMemoryError: GC overhead
limit
exceeded
at
org.galagosearch.core.index.CompressedByteBuffer.addRaw(CompressedByteBuffer.jav
a:28)
at
org.galagosearch.core.index.CompressedByteBuffer.add(CompressedByteBuffer.java:4
3)
at
org.galagosearch.core.index.ExtentListBuffer.addDocument(ExtentListBuffer.java:5
4)
at
org.galagosearch.core.index.ExtentIndexWriter.processNumber(ExtentIndexWriter.ja
va:60)
at
org.galagosearch.core.types.NumberedExtent$ExtentNameNumberBeginOrder$DuplicateE
liminator.processNumber(NumberedExtent.java:810)
at
org.galagosearch.core.types.NumberedExtent$ExtentNameNumberBeginOrder$ShreddedBu
ffer.copyUntilNumber(NumberedExtent.java:485)
at
org.galagosearch.core.types.NumberedExtent$ExtentNameNumberBeginOrder$ShreddedBu
ffer.copyUntilExtentName(NumberedExtent.java:462)
at
org.galagosearch.core.types.NumberedExtent$ExtentNameNumberBeginOrder$ShreddedBu
ffer.copyUntil(NumberedExtent.java:528)
at
org.galagosearch.core.types.NumberedExtent$ExtentNameNumberBeginOrder$ShreddedCo
mbiner.run(NumberedExtent.java:587)
at org.galagosearch.tupleflow.OrderedCombiner.run(OrderedCombiner.java:141)
at
org.galagosearch.tupleflow.execution.ThreadedStageExecutor$InstanceRunnable.run(
ThreadedStageExecutor.java:57)
at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1110)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603)
at java.lang.Thread.run(Thread.java:636)
Exception in thread "pool-1-thread-7" java.lang.OutOfMemoryError: GC overhead
limit
exceeded
at
org.galagosearch.core.types.DocumentWordPosition$DocumentWordPositionOrder$Tuple
Unshredder.clone(DocumentWordPosition.java:1455)
at
org.galagosearch.core.types.DocumentWordPosition$DocumentWordPositionOrder$Tuple
Unshredder.processTuple(DocumentWordPosition.java:1477)
at
org.galagosearch.core.types.DocumentWordPosition$DocumentWordPositionOrder$Dupli
cateEliminator.processTuple(DocumentWordPosition.java:1435)
at
org.galagosearch.core.types.DocumentWordPosition$DocumentWordPositionOrder$Shred
dedBuffer.copyTuples(DocumentWordPosition.java:1019)
at
org.galagosearch.core.types.DocumentWordPosition$DocumentWordPositionOrder$Shred
dedBuffer.copyUntilIndexPosition(DocumentWordPosition.java:1043)
at
org.galagosearch.core.types.DocumentWordPosition$DocumentWordPositionOrder$Shred
dedBuffer.copyUntilIndexWord(DocumentWordPosition.java:1035)
at
org.galagosearch.core.types.DocumentWordPosition$DocumentWordPositionOrder$Shred
dedBuffer.copyUntilDocument(DocumentWordPosition.java:1060)
at
org.galagosearch.core.types.DocumentWordPosition$DocumentWordPositionOrder$Shred
dedBuffer.copyUntil(DocumentWordPosition.java:1128)
at
org.galagosearch.core.types.DocumentWordPosition$DocumentWordPositionOrder$Shred
dedCombiner.run(DocumentWordPosition.java:1187)
at org.galagosearch.tupleflow.OrderedCombiner.run(OrderedCombiner.java:141)
at
org.galagosearch.tupleflow.execution.ThreadedStageExecutor$InstanceRunnable.run(
ThreadedStageExecutor.java:57)
at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1110)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603)
at java.lang.Thread.run(Thread.java:636)
Exception in thread "pool-1-thread-4" java.lang.OutOfMemoryError: GC overhead
limit
exceeded
at java.lang.Integer.valueOf(Integer.java:601)
at
org.galagosearch.core.types.NumberWordPosition$WordDocumentPositionOrder$Shredde
dBuffer.processPosition(NumberWordPosition.java:296)
at
org.galagosearch.core.types.NumberWordPosition$WordDocumentPositionOrder$Shredde
dReader.updatePosition(NumberWordPosition.java:740)
at
org.galagosearch.core.types.NumberWordPosition$WordDocumentPositionOrder$Shredde
dReader.fill(NumberWordPosition.java:706)
at
org.galagosearch.core.types.NumberWordPosition$WordDocumentPositionOrder$Shredde
dCombiner.run(NumberWordPosition.java:579)
at org.galagosearch.tupleflow.OrderedCombiner.run(OrderedCombiner.java:141)
at
org.galagosearch.tupleflow.execution.ThreadedStageExecutor$InstanceRunnable.run(
ThreadedStageExecutor.java:57)
at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1110)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603)
at java.lang.Thread.run(Thread.java:636)
Exception in thread "pool-1-thread-6" java.lang.OutOfMemoryError: GC overhead
limit
exceeded
Exception in thread "Thread-18081" java.lang.OutOfMemoryError: GC overhead
limit exceeded
Exception in thread "pool-1-thread-2" java.lang.OutOfMemoryError: GC overhead
limit
exceeded
Exception in thread "pool-1-thread-8" java.lang.OutOfMemoryError: GC overhead
limit
exceeded
Exception in thread "main" java.util.concurrent.ExecutionException: Stage threw
an
exception:
at
org.galagosearch.tupleflow.execution.JobExecutor$JobExecutionStatus.waitForStage
s(JobExecutor.java:1135)
at
org.galagosearch.tupleflow.execution.JobExecutor$JobExecutionStatus.run(JobExecu
tor.java:1054)
at
org.galagosearch.tupleflow.execution.JobExecutor.runWithServer(JobExecutor.java:
1191)
at
org.galagosearch.tupleflow.execution.JobExecutor.runLocally(JobExecutor.java:121
5)
at org.galagosearch.core.tools.App.handleBuild(App.java:121)
at org.galagosearch.core.tools.App.main(App.java:422)
Caused by: java.io.IOException: Problem when calling close method
at org.galagosearch.tupleflow.Linkage.close(Linkage.java:114)
at org.galagosearch.tupleflow.OrderedCombiner.run(OrderedCombiner.java:144)
at
org.galagosearch.tupleflow.execution.ThreadedStageExecutor$InstanceRunnable.run(
ThreadedStageExecutor.java:57)
at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1110)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603)
at java.lang.Thread.run(Thread.java:636)
Caused by: java.lang.reflect.InvocationTargetException
at sun.reflect.GeneratedMethodAccessor1.invoke(Unknown Source)
at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.jav
a:43)
at java.lang.reflect.Method.invoke(Method.java:616)
at org.galagosearch.tupleflow.Linkage.close(Linkage.java:109)
... 5 more
Caused by: java.lang.OutOfMemoryError: GC overhead limit exceeded
Original comment by tim.g.ar...@gmail.com
on 24 Apr 2009 at 1:34
I looked into this a bit more and can offer a bit more information. I was
using the
ThreadedStageExecutor, which has a thread in the thread pool for each processor
on
the system. It was running on an machine with 8 cores, so I presume this means
that
it will try to run stage instances on all 8 processors simultaneous where its
permitted. I imagine this would mean there's a strong relationship between peak
memory usage and the number of (logical?) processors on the machine.
Original comment by tim.g.ar...@gmail.com
on 11 May 2009 at 6:54
Galago has a thread that's constantly running to verify that the process isn't
close
to running out of memory; if it does get close, all activity stops and data is
flushed to disk. This has been fairly stable in the past, but I've started to
see
out of memory errors since January.
Possible issue:
- make-corpus creates huge tuples out of Documents which may cause memory
overflow,
since TupleFlow assumes that each tuple is small. This may overflow fixed-size
buffers in ShreddedBuffer.
- This could also happen in your case, tim.g.armstrong, but I'm not sure; the
extent
tuples should be much smaller, but perhaps some documents and/or extents have
very
long names.
Original comment by trevor.s...@gmail.com
on 11 May 2009 at 1:40
Original issue reported on code.google.com by
trevor.s...@gmail.com
on 28 Jan 2009 at 6:11