Closed Arsnael closed 2 weeks ago
Investigating... Latest work is abot pool connection. Seems to alter the server responses order somehow on the deployment tests.
I dont get why on imap mpt tests there is some tests with flag updates on multiple messages and it's fine, while deployment tests somehow the response order is altered.
Does not happen before the migration to pg pool connection. @hungphan227 any ideas perhaps?
I have reverted james sub module back to a commit before pool:
Then ran the test, it still fail:
org.apache.james.mpt.protocol.ProtocolSession$InvalidServerResponseException:
Location: /org/apache/james/imap/scripts/ValidateDeployment.test:299
LastClientMsg: A15 STORE 1:2 +FLAGS (\FLAGGED)
Expected: '\* 1 FETCH \(FLAGS \(\\Flagged \\Recent\)\)'
Actual : '* 2 FETCH (FLAGS (\Flagged \Recent))'
@hungphan227 this looks interesting....
git bisect
to find out the one commit that introduced the bug...I suggest you use git bisect to find out the one commit that introduced the bug...
Actually did... For the commit that commented Hung, it was working for me somehow. The work on the pool was my bisect bad.
@hungphan227 the deployment tests are also based on the local image built with jib during compilation. Just to be sure, when you reverted, did you recompile the all project and rebuilt the image with jib before relaunching the test? If not you likely retested on this work.
Does only the PGSQL backend fails or all backend fails?
Only pgsql
Do we have a way to reproduce the failure in James code base?
This would make it way easier to treat and manage the issue...
Do we have a way to reproduce the failure in James code base?
As said I tried using the similar scripts with mpt imap tests. In PostgresSelectedState, based on SelectedState test, some tests rely on the Store.test script, that has a store operation a bit similar. But it is green.
Tried as well tinkering the script a bit storing flags in more messages (like storing the Answered flag from message 1:4) but still green...
Thought maybe then it's TMail? But The binding of the pool is done in PostgresCommonModule that is bound already in tmailPostgresServer as well...
fix => https://github.com/apache/james-project/pull/2259
Was painful to do debugging by add logs everywhere in that part of the code, rebuilding the app and running the tests checking the logs... But I pinned it down to that line. concatMap instead of flatMap at that line ensure we keep the right uid order for the fetch responses of the store imap command
Fix merged, let's see if the CI now is happy
Can reproduce locally. Some things are wrong with postgres integration tests on tmail now
I have fixed some tests in TeamMailboxesContract. PR: https://github.com/apache/james-project/pull/2267 However, there is still one test I could not fix: webSocketShouldPushTeamMailboxStateChanges
I have fixed some tests in TeamMailboxesContract. PR: https://github.com/apache/james-project/pull/2267
Are you sure this is not some isolation issue in the tests instead?
I have fixed some tests in TeamMailboxesContract. PR: apache/james-project#2267
Are you sure this is not some isolation issue in the tests instead?
"Isolation issue" -> what do you mean?
"Isolation issue" -> what do you mean?
That I think you get some key violation constraints because of entries already present and not cleaned up well from other tests.
"Isolation issue" -> what do you mean?
That I think you get some key violation constraints because of entries already present and not cleaned up well from other tests.
That is not possible. We always drop all tables after each test
That is not possible. We always drop all tables after each test
Last commit, works for me
EDIT: it does not fix issues with websocket though, that's an other issue
I have fixed some tests in TeamMailboxesContract. PR: https://github.com/apache/james-project/pull/2267
cc @hungphan227 can you check if it is the case for label change (TMail side)?
I have fixed some tests in TeamMailboxesContract. PR: apache/james-project#2267
cc @hungphan227 can you check if it is the case for label change (TMail side)?
If Rene commit works, my PR is not needed. However, I did not see any real difference made in Rene commit. Really surprised if it actually work
If Rene commit works, my PR is not needed.
Is upsert supported by the Cassandra implementation?
If Rene commit works, my PR is not needed.
Is upsert supported by the Cassandra implementation?
Yes
Yes
Then we still need your upsert fix IMO.
Ok with that
Yes
Then we still need your upsert fix IMO.
Does Cassandra impl supporting upsert mean that postgres impl should support too?
All implems should have similar behavior on this kind of operation
Very unstable...
Sometimes some tests pass and sometimes not. My guess is there is a slight chance maybe with the email and mailbox change that there is two requests on different pool threads being executed in parallel with the same state, thus why sometimes postgres throws an exception on duplicated primary key entry?
A wild guess... Hard to debug honestly, as most of the time when I try to play a test alone it's green u_u
Very unstable...
Sometimes some tests pass and sometimes not. My guess is there is a slight chance maybe with the email and mailbox change that there is two requests on different pool threads being executed in parallel with the same state, thus why sometimes postgres throws an exception on duplicated primary key entry?
A wild guess... Hard to debug honestly, as most of the time when I try to play a test alone it's green u_u
Which test? Websocket?
Not always the same honestly. Once I even got a green build, to tell you how unstable it is
Not always the same honestly. Once I even got a green build, to tell you how unstable it is
Hi. What is the name of the unstable test?
You launch the postgres-jmap-integration-tests suite and you will see locally.
Tests failing recurrently for me are :
pushShouldSupportFilterTypeNameAndFilterStateWhenDataTypesAreNull
from LinagoraFilterStateChangeTestwebSocketShouldPushTeamMailboxStateChanges
from TeamMailboxesContractBut for example on my run now it seems it gets stuck on contactShouldBeIndexedWhenMailing in LinagoraContactAutocompleteMethodContract
I saw other tests sometimes red... and those tests sometimes being green.
So enjoy :)
FYI all tmail postgres-jmap-integration-tests
passed for me, except webSocketShouldPushTeamMailboxStateChanges
.
Error log:
16:09:28.421 [ERROR] o.a.j.e.d.InVmEventDelivery - Error while processing listener
reactor.core.publisher.Sinks$EmissionException: Spec. Rule 1.3 - onSubscribe, onNext, onError and onComplete signaled to a Subscriber MUST be signaled serially.
at reactor.core.publisher.InternalManySink.emitNext(InternalManySink.java:56)
at org.apache.james.jmap.change.StateChangeListener.$anonfun$reactiveEvent$2(StateChangeListener.scala:37)
at org.apache.james.jmap.change.StateChangeListener.$anonfun$reactiveEvent$2$adapted(StateChangeListener.scala:37)
at scala.Option.foreach(Option.scala:437)
at org.apache.james.jmap.change.StateChangeListener.$anonfun$reactiveEvent$1(StateChangeListener.scala:37)
at org.apache.james.jmap.change.StateChangeListener.$anonfun$reactiveEvent$1$adapted(StateChangeListener.scala:35)
at reactor.core.publisher.MonoCallable$MonoCallableSubscription.request(MonoCallable.java:137)
at reactor.core.publisher.MonoIgnoreElements$IgnoreElementsSubscriber.onSubscribe(MonoIgnoreElements.java:72)
at reactor.core.publisher.MonoCallable.subscribe(MonoCallable.java:48)
at reactor.core.publisher.Mono.subscribe(Mono.java:4512)
at reactor.core.publisher.MonoIgnorePublisher.subscribe(MonoIgnorePublisher.java:57)
at reactor.core.publisher.MonoUsing.subscribe(MonoUsing.java:109)
at reactor.core.publisher.MonoDefer.subscribe(MonoDefer.java:53)
at reactor.core.publisher.Mono.subscribe(Mono.java:4512)
at reactor.core.publisher.FluxFlatMap$FlatMapMain.onNext(FluxFlatMap.java:430)
at reactor.core.publisher.FluxIterable$IterableSubscription.slowPath(FluxIterable.java:335)
at reactor.core.publisher.FluxIterable$IterableSubscription.request(FluxIterable.java:294)
at reactor.core.publisher.FluxFlatMap$FlatMapMain.onSubscribe(FluxFlatMap.java:373)
at reactor.core.publisher.FluxIterable.subscribe(FluxIterable.java:201)
at reactor.core.publisher.FluxIterable.subscribe(FluxIterable.java:83)
at reactor.core.publisher.Mono.subscribe(Mono.java:4512)
at reactor.core.publisher.FluxFlatMap$FlatMapMain.onNext(FluxFlatMap.java:430)
at reactor.core.publisher.FluxArray$ArraySubscription.slowPath(FluxArray.java:126)
at reactor.core.publisher.FluxArray$ArraySubscription.request(FluxArray.java:99)
at reactor.core.publisher.FluxFlatMap$FlatMapMain.onSubscribe(FluxFlatMap.java:373)
at reactor.core.publisher.FluxMerge.subscribe(FluxMerge.java:73)
at reactor.core.publisher.InternalMonoOperator.subscribe(InternalMonoOperator.java:76)
at reactor.core.publisher.MonoIgnoreThen$ThenIgnoreMain.subscribeNext(MonoIgnoreThen.java:241)
at reactor.core.publisher.MonoIgnoreThen$ThenIgnoreMain.onComplete(MonoIgnoreThen.java:204)
at reactor.core.publisher.MonoUsing$MonoUsingSubscriber.onComplete(MonoUsing.java:283)
at reactor.core.publisher.MonoUsingWhen$MonoUsingWhenSubscriber.deferredComplete(MonoUsingWhen.java:271)
at reactor.core.publisher.FluxUsingWhen$CommitInner.onComplete(FluxUsingWhen.java:528)
at reactor.core.publisher.FluxPeek$PeekSubscriber.onComplete(FluxPeek.java:260)
at reactor.core.publisher.Operators$MultiSubscriptionSubscriber.onComplete(Operators.java:2231)
at reactor.core.publisher.MonoIgnoreThen$ThenIgnoreMain.onComplete(MonoIgnoreThen.java:210)
at reactor.core.publisher.MonoIgnoreThen$ThenIgnoreMain.onComplete(MonoIgnoreThen.java:210)
at reactor.pool.SimpleDequePool.maybeRecycleAndDrain(SimpleDequePool.java:537)
at reactor.pool.SimpleDequePool$QueuePoolRecyclerInner.onComplete(SimpleDequePool.java:767)
at reactor.core.publisher.Operators.complete(Operators.java:137)
at reactor.core.publisher.MonoEmpty.subscribe(MonoEmpty.java:46)
at reactor.core.publisher.Mono.subscribe(Mono.java:4512)
at reactor.pool.SimpleDequePool$QueuePoolRecyclerMono.subscribe(SimpleDequePool.java:879)
at reactor.core.publisher.MonoDefer.subscribe(MonoDefer.java:53)
at reactor.core.publisher.MonoIgnoreThen$ThenIgnoreMain.subscribeNext(MonoIgnoreThen.java:241)
at reactor.core.publisher.MonoIgnoreThen$ThenIgnoreMain.onComplete(MonoIgnoreThen.java:204)
at reactor.core.publisher.FluxPeek$PeekSubscriber.onComplete(FluxPeek.java:260)
at reactor.core.publisher.Operators.complete(Operators.java:137)
at reactor.core.publisher.MonoEmpty.subscribe(MonoEmpty.java:46)
at reactor.core.publisher.InternalMonoOperator.subscribe(InternalMonoOperator.java:76)
at reactor.core.publisher.MonoIgnoreThen$ThenIgnoreMain.subscribeNext(MonoIgnoreThen.java:264)
at reactor.core.publisher.MonoIgnoreThen.subscribe(MonoIgnoreThen.java:51)
at reactor.core.publisher.MonoDefer.subscribe(MonoDefer.java:53)
at reactor.core.publisher.MonoIgnoreThen$ThenIgnoreMain.subscribeNext(MonoIgnoreThen.java:241)
at reactor.core.publisher.MonoIgnoreThen$ThenIgnoreMain.onComplete(MonoIgnoreThen.java:204)
at reactor.core.publisher.MonoIgnoreElements$IgnoreElementsSubscriber.onComplete(MonoIgnoreElements.java:89)
at reactor.core.publisher.FluxHandleFuseable$HandleFuseableSubscriber.onComplete(FluxHandleFuseable.java:239)
at reactor.core.publisher.MonoSupplier$MonoSupplierSubscription.request(MonoSupplier.java:148)
at reactor.core.publisher.FluxHandleFuseable$HandleFuseableSubscriber.request(FluxHandleFuseable.java:260)
at reactor.core.publisher.MonoIgnoreElements$IgnoreElementsSubscriber.onSubscribe(MonoIgnoreElements.java:72)
at reactor.core.publisher.FluxHandleFuseable$HandleFuseableSubscriber.onSubscribe(FluxHandleFuseable.java:164)
at reactor.core.publisher.MonoSupplier.subscribe(MonoSupplier.java:48)
at reactor.core.publisher.InternalMonoOperator.subscribe(InternalMonoOperator.java:76)
at reactor.core.publisher.MonoIgnoreThen$ThenIgnoreMain.subscribeNext(MonoIgnoreThen.java:264)
at reactor.core.publisher.MonoIgnoreThen.subscribe(MonoIgnoreThen.java:51)
at reactor.core.publisher.InternalMonoOperator.subscribe(InternalMonoOperator.java:76)
at reactor.core.publisher.MonoDefer.subscribe(MonoDefer.java:53)
at reactor.core.publisher.FluxUsingWhen$UsingWhenSubscriber.onComplete(FluxUsingWhen.java:385)
at reactor.core.publisher.MonoIgnoreElements$IgnoreElementsSubscriber.onComplete(MonoIgnoreElements.java:89)
at reactor.core.publisher.SerializedSubscriber.onComplete(SerializedSubscriber.java:146)
at reactor.core.publisher.FluxRetryWhen$RetryWhenMainSubscriber.onComplete(FluxRetryWhen.java:203)
at reactor.core.publisher.MonoPeekTerminal$MonoTerminalPeekSubscriber.onComplete(MonoPeekTerminal.java:299)
at reactor.core.publisher.SerializedSubscriber.onComplete(SerializedSubscriber.java:146)
at reactor.core.publisher.FluxTimeout$TimeoutMainSubscriber.onComplete(FluxTimeout.java:235)
at reactor.core.publisher.FluxHide$SuppressFuseableSubscriber.onComplete(FluxHide.java:147)
at reactor.core.publisher.MonoNext$NextSubscriber.onComplete(MonoNext.java:102)
at reactor.core.publisher.MonoNext$NextSubscriber.onNext(MonoNext.java:83)
at reactor.core.publisher.MonoNext$NextSubscriber.onNext(MonoNext.java:82)
at org.jooq.impl.Internal$1.onNext(Internal.java:749)
at org.jooq.impl.R2DBC$Forwarding.onNext(R2DBC.java:242)
at org.jooq.impl.R2DBC$RowCountSubscriber.lambda$onNext$0(R2DBC.java:322)
at org.jooq.impl.Internal$1.onNext(Internal.java:749)
at reactor.core.publisher.StrictSubscriber.onNext(StrictSubscriber.java:89)
at reactor.core.publisher.FluxHandleFuseable$HandleFuseableSubscriber.onNext(FluxHandleFuseable.java:194)
at reactor.core.publisher.Operators$BaseFluxToMonoOperator.completePossiblyEmpty(Operators.java:2097)
at reactor.core.publisher.MonoCollectList$MonoCollectListSubscriber.onComplete(MonoCollectList.java:118)
at reactor.core.publisher.FluxHandle$HandleSubscriber.onComplete(FluxHandle.java:223)
at reactor.core.publisher.MonoFlatMapMany$FlatMapManyInner.onComplete(MonoFlatMapMany.java:260)
at reactor.core.publisher.FluxHandleFuseable$HandleFuseableSubscriber.onComplete(FluxHandleFuseable.java:239)
at reactor.core.publisher.FluxFilterFuseable$FilterFuseableConditionalSubscriber.onComplete(FluxFilterFuseable.java:391)
at reactor.core.publisher.FluxContextWrite$ContextWriteSubscriber.onComplete(FluxContextWrite.java:126)
at reactor.core.publisher.FluxPeekFuseable$PeekConditionalSubscriber.onComplete(FluxPeekFuseable.java:940)
at reactor.core.publisher.FluxPeekFuseable$PeekConditionalSubscriber.onComplete(FluxPeekFuseable.java:940)
at io.r2dbc.postgresql.util.FluxDiscardOnCancel$FluxDiscardOnCancelSubscriber.onComplete(FluxDiscardOnCancel.java:104)
at reactor.core.publisher.FluxDoFinally$DoFinallySubscriber.onComplete(FluxDoFinally.java:128)
at reactor.core.publisher.FluxHandle$HandleSubscriber.onComplete(FluxHandle.java:223)
at reactor.core.publisher.FluxCreate$BaseSink.complete(FluxCreate.java:465)
at reactor.core.publisher.FluxCreate$BufferAsyncSink.drain(FluxCreate.java:871)
at reactor.core.publisher.FluxCreate$BufferAsyncSink.complete(FluxCreate.java:819)
at reactor.core.publisher.FluxCreate$SerializedFluxSink.drainLoop(FluxCreate.java:249)
at reactor.core.publisher.FluxCreate$SerializedFluxSink.drain(FluxCreate.java:215)
at reactor.core.publisher.FluxCreate$SerializedFluxSink.complete(FluxCreate.java:206)
at io.r2dbc.postgresql.client.ReactorNettyClient$Conversation.complete(ReactorNettyClient.java:668)
at io.r2dbc.postgresql.client.ReactorNettyClient$BackendMessageSubscriber.emit(ReactorNettyClient.java:934)
at io.r2dbc.postgresql.client.ReactorNettyClient$BackendMessageSubscriber.onNext(ReactorNettyClient.java:810)
at io.r2dbc.postgresql.client.ReactorNettyClient$BackendMessageSubscriber.onNext(ReactorNettyClient.java:716)
at reactor.core.publisher.FluxHandle$HandleSubscriber.onNext(FluxHandle.java:129)
at reactor.core.publisher.FluxPeekFuseable$PeekConditionalSubscriber.onNext(FluxPeekFuseable.java:854)
at reactor.core.publisher.FluxMap$MapConditionalSubscriber.onNext(FluxMap.java:224)
at reactor.core.publisher.FluxMap$MapConditionalSubscriber.onNext(FluxMap.java:224)
at reactor.netty.channel.FluxReceive.drainReceiver(FluxReceive.java:294)
at reactor.netty.channel.FluxReceive.onInboundNext(FluxReceive.java:403)
at reactor.netty.channel.ChannelOperations.onInboundNext(ChannelOperations.java:426)
at reactor.netty.channel.ChannelOperationsHandler.channelRead(ChannelOperationsHandler.java:114)
at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:444)
at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:420)
at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:412)
at io.netty.handler.codec.ByteToMessageDecoder.fireChannelRead(ByteToMessageDecoder.java:346)
at io.netty.handler.codec.ByteToMessageDecoder.channelRead(ByteToMessageDecoder.java:318)
at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:444)
at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:420)
at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:412)
at io.netty.channel.DefaultChannelPipeline$HeadContext.channelRead(DefaultChannelPipeline.java:1410)
at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:440)
at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:420)
at io.netty.channel.DefaultChannelPipeline.fireChannelRead(DefaultChannelPipeline.java:919)
at io.netty.channel.epoll.AbstractEpollStreamChannel$EpollStreamUnsafe.epollInReady(AbstractEpollStreamChannel.java:800)
at io.netty.channel.epoll.EpollEventLoop.processReady(EpollEventLoop.java:509)
at io.netty.channel.epoll.EpollEventLoop.run(EpollEventLoop.java:407)
at io.netty.util.concurrent.SingleThreadEventExecutor$4.run(SingleThreadEventExecutor.java:997)
at io.netty.util.internal.ThreadExecutorMap$2.run(ThreadExecutorMap.java:74)
at io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30)
at java.base/java.lang.Thread.run(Thread.java:1583)
I changed the memory eventbus to RabbitMQ but still the quite similar error. I tested the Distributed/Memory IT but I do not see the same error. => This error log may be a good lead IMO.
I propose to disable the webSocketShouldPushTeamMailboxStateChanges
test and handle it in another ticket.
Problem is I had random errors on other tests, even locally. More than one test is unstable IMO
Problem is I had random errors on other tests, even locally. More than one test is unstable IMO
Hmm BTW unstable is the nature of those JMAP ITs IMO.
I also get the unstable on webSocketShouldPushTeamMailboxStateChanges
But the latest postgres is passed, it is unstable when commit upgrade james sha1
I try to disable the webSocketShouldPushTeamMailboxStateChanges
test here: https://github.com/linagora/tmail-backend/pull/1091
Let's see...
The issue comes from PoolBackedPostgresConnectionFactory
Commit: https://github.com/apache/james-project/commit/435600614fcfbc9bb2f4dccfea1b43a5069bd5a6#diff-29c9eefc2a62e1248232ac97ddadeaab789ffb85b10351e13e18aaa4e26c2195 (the change of PostgresCommonModule.java
)
I tried to revert it to SinglePostgresConnectionFactory
and webSocketShouldPushTeamMailboxStateChanges
passed.
It looks like an issue related to what we did with Reactor (not logic issue),
the StateChangeListener
already emits 2 messages, but the client subscriber receives only one.
Thanks for the help team, that should do it :)
Note that my fixup commit just makes test webSocketShouldPushTeamMailboxStateChanges
not hanging
It still unstable
We can @disable it as Quan's suggestion
// the 2 prs in james will help postgres-app better, (it doesn't fix the unstable)
It still unstable
Is it? Seemed okish to me :)
When I said other tests were unstable and not just the websocket one... https://james-jenkins.lin-saas.com/job/Tmail%20build/job/PR-1035/10/
T_T
https://james-jenkins.lin-saas.com/job/Tmail%20build/job/PR-1035/2/testReport/junit/com.linagora.tmail.deployment/PostgresImapAndSmtpTest/validateDeployment/
Investigating... Latest work is abot pool connection. Seems to alter the server responses order somehow on the deployment tests.