linagora / james-project

Mirror of Apache James Project
Apache License 2.0
72 stars 62 forks source link

[PGSQL] The reactor hanging issue with r2dbc-postgresql #5160

Open vttranlina opened 5 months ago

vttranlina commented 5 months ago

Related to the hanging issue with r2dbc-postgresql + JOOQ:

Example methods:

1: PostgresExecutor#executeCount 2: PostgresExecutor#executeDeleteAndReturnList 3: AttachmentPartRetriever

Updates:

  1. The hanging issue is not related to Jooq. Refer to the issue [Postgresql] Investigate why PopulateEmailQueryViewTask gets stuck while running I successfully resolved the issue by re-running Quan's lab with the addition of the JVM argument -Dreactor.bufferSize.small=100000. Both Jooq and Native SQL passed the test after this adjustment.

Note: While testing with Native SQL, an error was encountered: Cannot exchange messages because the request queue limit is exceeded, which has occurred sporadically in the past. However, no error log was generated when testing with Jooq, => We missed configure logger error for Jooq

  1. For hanging issue when DELETE ... RETURNING(deleteByMailboxId) The hanging issue persists when using native queries (without Jooq). I reported bug on r2dbc-postgresql repository here: https://github.com/pgjdbc/r2dbc-postgresql/issues/650

Reproduction Code:

public Flux<UUID> deleteByMailboxIdAndReturning(PostgresMailboxId mailboxId) {
        AtomicInteger counter = new AtomicInteger(0);
        AtomicInteger doOnNextCounter = new AtomicInteger(0);

        return postgresExecutor.connection()
            .flatMapMany(con -> Flux.from(con.createStatement("DELETE FROM message_mailbox WHERE mailbox_id = $1 RETURNING message_id")
                    .bind(0, mailboxId.asUuid())
                    .execute())
                .flatMap(result -> result.map((row, rowMetadata) -> {
                    String msgIdWasDeleted = row.get(0, String.class);
                    System.out.println("msg was deleted(" + counter.incrementAndGet() + "): " + msgIdWasDeleted);
                    return msgIdWasDeleted;
                }), 5,15))
            .doOnNext(e -> {
                System.out.println("____doOnNext("+doOnNextCounter.incrementAndGet()+"): " + e);
            })
            .map(UUID::fromString);
    }

Resolution: Resolved by replacing .collectList().flatMapIterable(list -> list) with .window(1).flatMap(flux -> flux). Both methods have been tested successfully. WDYT about new way?

vttranlina commented 5 months ago

Thread dump when running DeleteMessageListenerContract#deleteMessageListenerShouldSucceedWhenDeleteMailboxHasALotOfMessages with Reproduction Code above threads_report.txt

"deleteByMailboxIdAndReturning-1" prio=0 tid=0x0 nid=0x0 waiting on condition
     java.lang.Thread.State: WAITING
 on java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject@44e29989
    at java.base@21.0.2/jdk.internal.misc.Unsafe.park(Native Method)
    at java.base@21.0.2/java.util.concurrent.locks.LockSupport.park(LockSupport.java:371)
    at java.base@21.0.2/java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionNode.block(AbstractQueuedSynchronizer.java:519)
    at java.base@21.0.2/java.util.concurrent.ForkJoinPool.unmanagedBlock(ForkJoinPool.java:3780)
    at java.base@21.0.2/java.util.concurrent.ForkJoinPool.managedBlock(ForkJoinPool.java:3725)
    at java.base@21.0.2/java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:1707)
    at java.base@21.0.2/java.util.concurrent.ScheduledThreadPoolExecutor$DelayedWorkQueue.take(ScheduledThreadPoolExecutor.java:1170)
    at java.base@21.0.2/java.util.concurrent.ScheduledThreadPoolExecutor$DelayedWorkQueue.take(ScheduledThreadPoolExecutor.java:899)
    at java.base@21.0.2/java.util.concurrent.ThreadPoolExecutor.getTask(ThreadPoolExecutor.java:1070)
    at java.base@21.0.2/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1130)
    at java.base@21.0.2/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:642)
    at java.base@21.0.2/java.lang.Thread.runWith(Thread.java:1596)
    at java.base@21.0.2/java.lang.Thread.run(Thread.java:1583)
vttranlina commented 5 months ago

Now I don't have any new ideas for this issue. Wait for the response from r2dbc-postgresql maintainer on the bug reported

chibenwa commented 5 months ago

Yes the threads are doing strictly nothing!

chibenwa commented 5 months ago

Could you rerun your example but print thread name at the start of your debug messages?

vttranlina commented 5 months ago

Could you rerun your example but print thread name at the start of your debug messages?

reactor-tcp-nio-1- msg was deleted(1): 018ed018-1402-7830-ac4f-626e05961402
reactor-tcp-nio-1____doOnNext(1): 018ed018-1402-7830-ac4f-626e05961402
reactor-tcp-nio-1- msg was deleted(3): 018ed018-14f3-7363-8111-6c8d0cb4f204
reactor-tcp-nio-1____doOnNext(2): 018ed018-14f3-7363-8111-6c8d0cb4f204
reactor-tcp-nio-1- msg was deleted(5): 018ed018-1517-7fe3-8c0d-f42ef83f6b51
reactor-tcp-nio-1____doOnNext(3): 018ed018-1517-7fe3-8c0d-f42ef83f6b51
reactor-tcp-nio-1- msg was deleted(7): 018ed018-1536-77a6-a157-80da658c0591
reactor-tcp-nio-1____doOnNext(4): 018ed018-1536-77a6-a157-80da658c0591
reactor-tcp-nio-1- msg was deleted(9): 018ed018-1551-7085-ba9f-d3ff0dbaa154
reactor-tcp-nio-1- msg was deleted(11): 018ed018-156f-7c5b-81e2-984b207faa39
reactor-tcp-nio-1- msg was deleted(13): 018ed018-158c-7248-8277-2ecde78841ee
reactor-tcp-nio-1- msg was deleted(15): 018ed018-15a5-7af8-af7c-fe0b661afc45
reactor-tcp-nio-1- msg was deleted(17): 018ed018-15be-759f-8789-868947c8b038
reactor-tcp-nio-1- msg was deleted(19): 018ed018-15da-71fd-aaad-eae1344ccc46
reactor-tcp-nio-1- msg was deleted(21): 018ed018-15f5-7f6a-b9ac-9d823fa26653
reactor-tcp-nio-1- msg was deleted(23): 018ed018-1613-76ed-aa05-6a535d1a6336
reactor-tcp-nio-1- msg was deleted(25): 018ed018-162a-763c-9342-61992c9ccdde
reactor-tcp-nio-1- msg was deleted(27): 018ed018-1640-7766-b433-aacd5f10e3d9
reactor-tcp-nio-1- msg was deleted(29): 018ed018-1658-7e51-9f28-bfa0a0c78183
vttranlina commented 5 months ago

Full debug log test-run1.log Search keyword: "PostgresMailboxMessageDAO"

vttranlina commented 5 months ago

I created a new simple maven project (independent of james) for easy reproduction, debugging https://github.com/vttranlina/r2dbc-postgresql-test.git Feel free to debug it if interested