linagora / james-project

Mirror of Apache James Project
Apache License 2.0
70 stars 63 forks source link

OOM upon IMAP COPY #5201

Closed chibenwa closed 1 month ago

chibenwa commented 1 month ago

I encountered this on production:

"java.lang.OutOfMemoryError: Java heap space\n\t
at java.base/java.util.Arrays.copyOf(Unknown Source)\n\t
at java.base/java.util.ArrayList.grow(Unknown Source)\n\t
at java.base/java.util.ArrayList.grow(Unknown Source)\n\t
at java.base/java.util.ArrayList.add(Unknown Source)\n\t
at java.base/java.util.ArrayList.add(Unknown Source)\n\t
at org.apache.james.mailbox.model.MessageRange.split(MessageRange.java:247)\n\t
at org.apache.james.mailbox.store.MessageBatcher.batchMessagesReactive(MessageBatcher.java:70)\n\t
at org.apache.james.mailbox.store.StoreMailboxManager.lambda$copyMessagesReactive$48(StoreMailboxManager.java:713)\n\t
at org.apache.james.mailbox.store.StoreMailboxManager$$Lambda/0x00007f12613caab8.apply(Unknown Source)\n\t
at reactor.core.publisher.MonoFlatMapMany$FlatMapManyMain.onNext(MonoFlatMapMany.java:163)\n\t
at reactor.core.publisher.MonoZip$ZipCoordinator.signal(MonoZip.java:297)\n\t
at reactor.core.publisher.MonoZip$ZipInner.onNext(MonoZip.java:478)\n\t
at reactor.core.publisher.FluxMap$MapSubscriber.onNext(FluxMap.java:122)\n\t
at reactor.core.publisher.FluxSwitchIfEmpty$SwitchIfEmptySubscriber.onNext(FluxSwitchIfEmpty.java:74)\n\t
at reactor.core.publisher.MonoZip$ZipCoordinator.signal(MonoZip.java:297)\n\t
at reactor.core.publisher.MonoZip$ZipInner.onNext(MonoZip.java:478)\n\t
at reactor.core.publisher.MonoFlatMap$FlatMapMain.secondComplete(MonoFlatMap.java:245)\n\t
at reactor.core.publisher.MonoFlatMap$FlatMapInner.onNext(MonoFlatMap.java:305)\n\t
at reactor.core.publisher.FluxMapFuseable$MapFuseableSubscriber.onNext(FluxMapFuseable.java:129)\n\t
at reactor.core.publisher.Operators$ScalarSubscription.request(Operators.java:2571)\n\t
at reactor.core.publisher.FluxMapFuseable$MapFuseableSubscriber.request(FluxMapFuseable.java:171)\n\t
at reactor.core.publisher.MonoFlatMap$FlatMapInner.onSubscribe(MonoFlatMap.java:291)\n\t
at reactor.core.publisher.FluxMapFuseable$MapFuseableSubscriber.onSubscribe(FluxMapFuseable.java:96)\n\t
at reactor.core.publisher.MonoJust.subscribe(MonoJust.java:55)\n\t
at reactor.core.publisher.InternalMonoOperator.subscribe(InternalMonoOperator.java:76)\n\t
at reactor.core.publisher.MonoFlatMap$FlatMapMain.onNext(MonoFlatMap.java:165)\n\t
at reactor.core.publisher.FluxOnErrorResume$ResumeSubscriber.onNext(FluxOnErrorResume.java:79)\n\t
at reactor.core.publisher.FluxMap$MapSubscriber.onNext(FluxMap.java:122)\n\t
at reactor.core.publisher.MonoPublishOn$PublishOnSubscriber.run(MonoPublishOn.java:181)\n\t
at reactor.core.scheduler.SchedulerTask.call(SchedulerTask.java:68)\n\t
at reactor.core.scheduler.SchedulerTask.call(SchedulerTask.java:28)\n\t
at java.base/java.util.concurrent.FutureTask.run(Unknown Source)\n"

Was able to reproduce:

image

This was actually encountered with the following batchSizes:

copy=10
move=10

And increasing aggressively the batch size was actually usefull as a work around:

copy=2000000000
move=2000000000

However I fear this means the overall batching process for MOVE and COPY makes little sense...

I do think this could be handle in a pure reactive way:

Caused by: java.lang.IllegalArgumentException: 'copyBatchSize' must be greater than zero
    at com.google.common.base.Preconditions.checkArgument(Preconditions.java:143)
    at org.apache.james.mailbox.store.BatchSizes$Builder.copyBatchSize(BatchSizes.java:86)
    at org.apache.james.modules.mailbox.CassandraSessionModule.getBatchSizesConfiguration(CassandraSessionModule.java:109)

:-(

chibenwa commented 1 month ago

CF https://issues.apache.org/jira/browse/JAMES-4041

chibenwa commented 1 month ago

PR: https://github.com/apache/james-project/pull/2265

Sorry as this issue is very bad, I really did not wanted it to hang around for long...