linagora / tmail-backend

GNU Affero General Public License v3.0
30 stars 17 forks source link

ISSUE-1099 Stablize JMAP Distributed tests #1101

Closed quantranhong1999 closed 2 weeks ago

quantranhong1999 commented 2 weeks ago

I could not reproduce https://james-jenkins.lin-saas.com/blue/rest/organizations/jenkins/pipelines/Tmail%20build/branches/PR-1098/runs/2/log/?start=0 on my local so it is a bit hard to debug.

From what I tried to reproduce: likely concurrent access to DockerCassandraSingleton (e.g. one tries to stop and another tries to start) leads to the initialize static block of DockerCassandraSingleton class failing with ExceptionInInitializerError -> DockerCassandraSingleton class can not be created properly -> test hangs.

The solution (hopefully it works) is to escape hanging tests upon ExceptionInInitializerError and retry them.

quantranhong1999 commented 2 weeks ago

1st build -> green

quantranhong1999 commented 2 weeks ago

2nd run -> seems hang but not the previous error

quantranhong1999 commented 2 weeks ago

3rd run -> failed

[ERROR] Errors: 

[ERROR] com.linagora.tmail.james.DistributedLinagoraCalendarEventAcceptMethodTest.null

[ERROR]   Run 1: DistributedLinagoraCalendarEventAcceptMethodTest » AllNodesFailed Could not reach any contact point, make sure you've provided valid addresses (showing first 1 nodes, use getAllErrors() for more): Node(endPoint=192.168.0.1/<unresolved>:54132, hostId=null, hashCode=46384550): [com.datastax.oss.driver.api.core.connection.ConnectionInitException: [s68|control|connecting...] Protocol initialization request, step 1 (OPTIONS): failed to send request (io.netty.channel.StacklessClosedChannelException)]

[ERROR]   Run 2: DistributedLinagoraCalendarEventAcceptMethodTest » AllNodesFailed Could not reach any contact point, make sure you've provided valid addresses (showing first 1 nodes, use getAllErrors() for more): Node(endPoint=192.168.0.1/<unresolved>:54132, hostId=null, hashCode=7c3c9e19): [com.datastax.oss.driver.api.core.connection.ConnectionInitException: [s74|control|connecting...] Protocol initialization request, step 1 (OPTIONS): failed to send request (io.netty.channel.StacklessClosedChannelException)]

It seems retrying for tests was not useful. Cassandra still can not work properly.

BTW I spot that the build on node ci-james-03 usually fails, while it succeeds on ci-james-06.

I would try to lower the forks usage to see...

quantranhong1999 commented 2 weeks ago

1st build (ci-james-06) with disable reuseForks: [INFO] Team-mail :: Integration Tests :: JMAP :: Distributed SUCCESS [20:19 min]

vttranlina commented 2 weeks ago

can we test with reuseFork = true, forkCount=1 ?

quantranhong1999 commented 2 weeks ago

can we test with reuseFork = true, forkCount=1 ?

there you go https://github.com/linagora/tmail-backend/pull/1102/commits/3fa6aa4ea9111568e39e3388685f9e005b7fa81b

quantranhong1999 commented 2 weeks ago

2nd build (on ci-james-03): [INFO] Team-mail :: Integration Tests :: JMAP :: Distributed SUCCESS [20:07 min]

It seems to be more stable.