OutOfMemory with large folder hierarchy

GoogleCodeExporter commented 9 years ago

What steps will reproduce the problem?
1. Create a large folder hierarchy. For example 9 levels of folders with about 
24M folders. In each folder, never put more than 10 sub-folders. In the last 
level one can put some documents.
2. Start the importer with a heap space of 4G. (or less)

What is the expected output?
The importer should run without OutOfMemory erros.

What do you see instead?
After several hours, the workQueue of the 
BulkFilesystemImporterThreadPoolExecutor is so big that an OutOfMemory error is 
thrown. (I had more than 1.2M UnitOfWork in the queue.)

What version of the product are you using? On what operating system?
I used the Tag Version 1.1 with Alfresco EE 3.4.6 on an Ubuntu 10.04.4 LTS 
64bit with Java SE 1.6.0_29 64bit.

-- FIX --
I fixed this problem by setting a maxCapacity value to the LinkedBlockingQueue 
in the BulkFilesystemImporterThreadPoolExecutor. In order to avoid some import 
to be rejected, I used the CallerRunsPolicy as RejectedExecutionHanlder.

public BulkFilesystemImporterThreadPoolExecutor(final int corePoolSize, final 
int maximumPoolSize, final long keepAliveTime, final TimeUnit 
keepAliveTimeUnit, final int blockingQueueSize)
{
    super(corePoolSize, maximumPoolSize, keepAliveTime, keepAliveTimeUnit, new LinkedBlockingQueue<Runnable>(blockingQueueSize), new BulkFilesystemImporterThreadFactory());

    if (log.isDebugEnabled()) log.debug("Creating new bulk import thread pool." +
                                        "\n\tcorePoolSize = " + corePoolSize +
                                        "\n\tmaximumPoolSize = " + maximumPoolSize +
                                        "\n\tkeepAliveTime = " + keepAliveTime + " " + String.valueOf(keepAliveTimeUnit) +
                    "\n\tblockingQueueSize = " + blockingQueueSize);
    this.setRejectedExecutionHandler(new CallerRunsPolicy());
}

For more convenience during the tests, I added the maxCapacity to the 
constructor of the BulkFilesystemImporterThreadPoolExecutor class, the spring 
config and alfresco-global.properties.

Original issue reported on code.google.com by sahli.al...@gmail.com on 30 Jul 2012 at 10:08

GoogleCodeExporter commented 9 years ago

Thanks for the fix!  I'll merge it into the next version of the tool.

Original comment by pmo...@gmail.com on 28 Aug 2012 at 11:43

Changed state: Accepted
Added labels: Priority-High
Removed labels: Priority-Medium

GoogleCodeExporter commented 9 years ago

This issue was closed by revision 887d5de11a79.

Original comment by pmo...@gmail.com on 9 Sep 2012 at 10:10

Changed state: Fixed
Added labels: ****
Removed labels: ****

GoogleCodeExporter commented 9 years ago

Hi Peter!

Why did you not commit this fix in the 3x branch?

Regards, Alain

Original comment by sahli.al...@gmail.com on 10 Sep 2012 at 10:16

Added labels: ****
Removed labels: ****

GoogleCodeExporter commented 9 years ago

G'day Alain,

I'm planning on merging it back as part of the work on issue
110<http://code.google.com/p/alfresco-bulk-filesystem-import/issues/detail?id=11
0>
[1]
(which also requires some back-merging).

Cheers,
Peter

[1]
http://code.google.com/p/alfresco-bulk-filesystem-import/issues/detail?id=110

On Mon, Sep 10, 2012 at 3:16 AM, <
alfresco-bulk-filesystem-import@googlecode.com> wrote:

Original comment by pmo...@gmail.com on 10 Sep 2012 at 5:33

Added labels: ****
Removed labels: ****

GoogleCodeExporter commented 9 years ago

Ok, thanks for the quick answer!

Cheers,
Alain

Original comment by sahli.al...@gmail.com on 10 Sep 2012 at 5:35

Added labels: ****
Removed labels: ****

GoogleCodeExporter commented 9 years ago

Note: I had to reverse this change after noticing some problems around 
out-of-order execution and difficulties in stopping the thread pool.  Note the 
following text from the Javadocs [1] (emphasis added):

"New tasks submitted in method execute(java.lang.Runnable) will be rejected 
_*when the Executor has been shut down*_, and also when the Executor uses 
finite bounds for both maximum threads and work queue capacity, and is 
saturated."

[1] 
http://docs.oracle.com/javase/7/docs/api/java/util/concurrent/ThreadPoolExecutor
.html

Original comment by pmo...@gmail.com on 3 Dec 2013 at 6:18

Changed state: WontFix
Added labels: ****
Removed labels: ****

GoogleCodeExporter commented 9 years ago

I think the only option if you're importing an exceptionally large folder 
hierarchy will be to throw more memory at Alfresco.

Original comment by pmo...@gmail.com on 3 Dec 2013 at 6:20

Added labels: ****
Removed labels: ****

macressler / alfresco-bulk-filesystem-import

OutOfMemory with large folder hierarchy #109