macressler / alfresco-bulk-filesystem-import

Automatically exported from code.google.com/p/alfresco-bulk-filesystem-import
GNU Lesser General Public License v3.0
1 stars 0 forks source link

OutOfMemory with large folder hierarchy #109

Closed GoogleCodeExporter closed 9 years ago

GoogleCodeExporter commented 9 years ago
What steps will reproduce the problem?
1. Create a large folder hierarchy. For example 9 levels of folders with about 
24M folders. In each folder, never put more than 10 sub-folders. In the last 
level one can put some documents.
2. Start the importer with a heap space of 4G. (or less)

What is the expected output?
The importer should run without OutOfMemory erros.

What do you see instead?
After several hours, the workQueue of the 
BulkFilesystemImporterThreadPoolExecutor is so big that an OutOfMemory error is 
thrown. (I had more than 1.2M UnitOfWork in the queue.)

What version of the product are you using? On what operating system?
I used the Tag Version 1.1 with Alfresco EE 3.4.6 on an Ubuntu 10.04.4 LTS 
64bit with Java SE 1.6.0_29 64bit.

-- FIX --
I fixed this problem by setting a maxCapacity value to the LinkedBlockingQueue 
in the BulkFilesystemImporterThreadPoolExecutor. In order to avoid some import 
to be rejected, I used the CallerRunsPolicy as RejectedExecutionHanlder.

public BulkFilesystemImporterThreadPoolExecutor(final int corePoolSize, final 
int maximumPoolSize, final long keepAliveTime, final TimeUnit 
keepAliveTimeUnit, final int blockingQueueSize)
{
    super(corePoolSize, maximumPoolSize, keepAliveTime, keepAliveTimeUnit, new LinkedBlockingQueue<Runnable>(blockingQueueSize), new BulkFilesystemImporterThreadFactory());

    if (log.isDebugEnabled()) log.debug("Creating new bulk import thread pool." +
                                        "\n\tcorePoolSize = " + corePoolSize +
                                        "\n\tmaximumPoolSize = " + maximumPoolSize +
                                        "\n\tkeepAliveTime = " + keepAliveTime + " " + String.valueOf(keepAliveTimeUnit) +
                    "\n\tblockingQueueSize = " + blockingQueueSize);
    this.setRejectedExecutionHandler(new CallerRunsPolicy());
}

For more convenience during the tests, I added the maxCapacity to the 
constructor of the BulkFilesystemImporterThreadPoolExecutor class, the spring 
config and alfresco-global.properties.

Original issue reported on code.google.com by sahli.al...@gmail.com on 30 Jul 2012 at 10:08

GoogleCodeExporter commented 9 years ago
Thanks for the fix!  I'll merge it into the next version of the tool.

Original comment by pmo...@gmail.com on 28 Aug 2012 at 11:43

GoogleCodeExporter commented 9 years ago
This issue was closed by revision 887d5de11a79.

Original comment by pmo...@gmail.com on 9 Sep 2012 at 10:10

GoogleCodeExporter commented 9 years ago
Hi Peter!

Why did you not commit this fix in the 3x branch?

Regards, Alain

Original comment by sahli.al...@gmail.com on 10 Sep 2012 at 10:16

GoogleCodeExporter commented 9 years ago
G'day Alain,

I'm planning on merging it back as part of the work on issue
110<http://code.google.com/p/alfresco-bulk-filesystem-import/issues/detail?id=11
0>
[1]
(which also requires some back-merging).

Cheers,
Peter

[1]
http://code.google.com/p/alfresco-bulk-filesystem-import/issues/detail?id=110

On Mon, Sep 10, 2012 at 3:16 AM, <
alfresco-bulk-filesystem-import@googlecode.com> wrote:

Original comment by pmo...@gmail.com on 10 Sep 2012 at 5:33

GoogleCodeExporter commented 9 years ago
Ok, thanks for the quick answer!

Cheers,
Alain

Original comment by sahli.al...@gmail.com on 10 Sep 2012 at 5:35

GoogleCodeExporter commented 9 years ago
Note: I had to reverse this change after noticing some problems around 
out-of-order execution and difficulties in stopping the thread pool.  Note the 
following text from the Javadocs [1] (emphasis added):

"New tasks submitted in method execute(java.lang.Runnable) will be rejected 
_*when the Executor has been shut down*_, and also when the Executor uses 
finite bounds for both maximum threads and work queue capacity, and is 
saturated."

[1] 
http://docs.oracle.com/javase/7/docs/api/java/util/concurrent/ThreadPoolExecutor
.html

Original comment by pmo...@gmail.com on 3 Dec 2013 at 6:18

GoogleCodeExporter commented 9 years ago
I think the only option if you're importing an exceptionally large folder 
hierarchy will be to throw more memory at Alfresco.

Original comment by pmo...@gmail.com on 3 Dec 2013 at 6:20