hashgraph / hedera-services

Crypto, token, consensus, file, and smart contract services for the Hedera public ledger
Apache License 2.0
313 stars 138 forks source link

Unit Test Failure: Stabilize VirtualMap reconnect tests #11507

Closed imalygin closed 7 months ago

imalygin commented 9 months ago

Description

The following tests are failing intermittently:

VirtualMapReconnectTest » teacherAbortsReconnectOnFirstInternal

VirtualMapLargeReconnectTest » multipleAbortedReconnectsCanSucceed(int, int, int, int)[1]
VirtualMapLargeReconnectTest » multipleAbortedReconnectsCanSucceed(int, int, int, int)[3]

https://scans.gradle.com/s/p7b5275t7a4o2 https://scans.gradle.com/s/d6alj3i3wcreg

Steps to reproduce

Run abovementioned tests.

Additional context

No response

Hedera network

other

Version

v0.48

Operating system

None

imalygin commented 8 months ago

Unfortunately, the issue has reoccurred:

See here Failed on this PR - https://github.com/hashgraph/hedera-services/pull/11798/files

imalygin commented 8 months ago

This test had multiple reasons to fail. This one

https://scans.gradle.com/s/d6alj3i3wcreg/tests/task/:swirlds-merkle:timingSensitive/details/com.swirlds.virtual.merkle.reconnect.VirtualMapLargeReconnectTest/multipleAbortedReconnectsCanSucceed(int%2C%20int%2C%20int%2C%20int)%5B1%5D?top-execution=1

is fixed by this PR https://github.com/hashgraph/hedera-services/pull/11718 and this PR https://github.com/hashgraph/hedera-services/pull/12039

This one

https://scans.gradle.com/s/d6alj3i3wcreg/tests/task/:swirlds-merkle:timingSensitive/details/com.swirlds.virtual.merkle.reconnect.VirtualMapLargeReconnectTest/multipleAbortedReconnectsCanSucceed(int%2C%20int%2C%20int%2C%20int)%5B1%5D?top-execution=1

requires further investigation. During the reconnect a teacher suddenly receives SocketClosedException:

    Caused by: com.swirlds.common.merkle.synchronization.utility.MerkleSynchronizationException: Synchronization failed with exceptions 
        at com.swirlds.common.merkle.synchronization.TeachingSynchronizer.sendTree(TeachingSynchronizer.java:199)   
        at com.swirlds.common.merkle.synchronization.TeachingSynchronizer.synchronize(TeachingSynchronizer.java:136)    
        at com.swirlds.common.test.fixtures.merkle.util.MerkleTestUtils.teachingSynchronizerThread(MerkleTestUtils.java:957)    
        at com.swirlds.common.test.fixtures.merkle.util.MerkleTestUtils.lambda$testSynchronization$4(MerkleTestUtils.java:1038) 
        at com.swirlds.common.threading.pool.StandardWorkGroup.lambda$execute$1(StandardWorkGroup.java:142) 
        at java.base/java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:572)    
        at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:317)   
        at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1144)    
        at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:642)    
        at java.base/java.lang.Thread.run(Thread.java:1583) 
    Caused by: java.util.concurrent.ExecutionException: com.swirlds.common.merkle.synchronization.utility.MerkleSynchronizationException: Failed to deserialize object with class ID 8989011001974646263(0x7CBF61E166C6E5F7) (class com.swirlds.common.merkle.synchronization.internal.QueryResponse)   
        at java.base/java.util.concurrent.FutureTask.report(FutureTask.java:122)    
        at java.base/java.util.concurrent.FutureTask.get(FutureTask.java:191)   
        at com.swirlds.common.threading.futures.ConcurrentFuturePool.lambda$waitForCompletion$2(ConcurrentFuturePool.java:154)  
        at java.base/java.util.concurrent.ConcurrentLinkedQueue.forEachFrom(ConcurrentLinkedQueue.java:1037)    
        at java.base/java.util.concurrent.ConcurrentLinkedQueue.forEach(ConcurrentLinkedQueue.java:1054)    
        at com.swirlds.common.threading.futures.ConcurrentFuturePool.waitForCompletion(ConcurrentFuturePool.java:147)   
        at com.swirlds.common.threading.pool.StandardWorkGroup.waitForTermination(StandardWorkGroup.java:156)   
        at com.swirlds.common.merkle.synchronization.TeachingSynchronizer.sendTree(TeachingSynchronizer.java:195)   
        ... 9 more  
    Caused by: com.swirlds.common.merkle.synchronization.utility.MerkleSynchronizationException: Failed to deserialize object with class ID 8989011001974646263(0x7CBF61E166C6E5F7) (class com.swirlds.common.merkle.synchronization.internal.QueryResponse)    
        at com.swirlds.common.merkle.synchronization.streams.AsyncInputStream.run(AsyncInputStream.java:157)    
        ... 6 more  
    Caused by: java.net.SocketException: Socket closed  
        at java.base/sun.nio.ch.NioSocketImpl.endRead(NioSocketImpl.java:243)   
        at java.base/sun.nio.ch.NioSocketImpl.implRead(NioSocketImpl.java:323)  
        at java.base/sun.nio.ch.NioSocketImpl.read(NioSocketImpl.java:346)  
        at java.base/sun.nio.ch.NioSocketImpl$1.read(NioSocketImpl.java:796)

And the reason for its occurrence is remain to be seen