palantir / atlasdb

Transactional Distributed Database Layer
https://palantir.github.io/atlasdb/
Apache License 2.0
51 stars 9 forks source link

Fix or Delete `LockWatchValueIntegrationTest#valueStressTest` #6699

Open tpetracca opened 1 year ago

tpetracca commented 1 year ago

This test is a plague on the repo. It flakes constantly. This is bad for a variety of reasons:

I've proposed Ignoring the test outright. But someone should follow-up with either a true fix or just deleting the test.

tpetracca commented 1 year ago

I can see 6 failures in just the last 5 hours in circleci (across a variety of branches including develop): https://app.circleci.com/pipelines/github/palantir/atlasdb

Examples:

Failures is always:

java.lang.AssertionError: Encountered nonretriable exception
    at com.palantir.atlasdb.timelock.LockWatchValueIntegrationTest.valueStressTest(LockWatchValueIntegrationTest.java:567)
    .... stuff ....
Caused by: java.util.concurrent.ExecutionException: com.palantir.logsafe.exceptions.SafeRuntimeException: Fallback cache threw an exception
    at java.base/java.util.concurrent.FutureTask.report(FutureTask.java:122)
    at java.base/java.util.concurrent.FutureTask.get(FutureTask.java:205)
    at com.palantir.atlasdb.timelock.LockWatchValueIntegrationTest.valueStressTest(LockWatchValueIntegrationTest.java:564)
    ... 56 more
Caused by: com.palantir.logsafe.exceptions.SafeRuntimeException: Fallback cache threw an exception
    at app//com.palantir.atlasdb.keyvalue.api.ResilientLockWatchProxy.handleException(ResilientLockWatchProxy.java:88)
    at app//com.palantir.atlasdb.keyvalue.api.ResilientLockWatchProxy.handleInvocation(ResilientLockWatchProxy.java:77)
    at app//com.google.common.reflect.AbstractInvocationHandler.invoke(AbstractInvocationHandler.java:87)
    at app//com.sun.proxy.$Proxy184.onSuccessfulCommit(Unknown Source)
    at app//com.palantir.lock.watch.LockWatchCacheImpl.onTransactionCommit(LockWatchCacheImpl.java:58)
    at app//com.palantir.atlasdb.keyvalue.api.watch.LockWatchManagerImpl.onTransactionCommit(LockWatchManagerImpl.java:138)
    at app//com.palantir.atlasdb.transaction.impl.SnapshotTransactionManager.lambda$startTransactions$4(SnapshotTransactionManager.java:222)
    at java.base@11.0.19/java.util.concurrent.CopyOnWriteArrayList.forEach(CopyOnWriteArrayList.java:807)
    at app//com.palantir.atlasdb.transaction.impl.SnapshotTransaction$SuccessCallbackManager.runCallbacks(SnapshotTransaction.java:2885)
    at app//com.palantir.atlasdb.transaction.impl.SnapshotTransaction.runSuccessCallbacksIfDefinitivelyCommitted(SnapshotTransaction.java:1928)
    at app//com.palantir.atlasdb.transaction.impl.AbstractTransactionManager.runTaskThrowOnConflictWithCallback(AbstractTransactionManager.java:85)
    at app//com.palantir.atlasdb.transaction.impl.SnapshotTransactionManager$OpenTransactionImpl.finishWithCallback(SnapshotTransactionManager.java:270)
    at app//com.palantir.atlasdb.transaction.impl.SnapshotTransactionManager.runTaskWithConditionThrowOnConflict(SnapshotTransactionManager.java:186)
    at app//com.palantir.atlasdb.transaction.impl.SerializableTransactionManager.runTaskWithConditionThrowOnConflict(SerializableTransactionManager.java:58)
    at app//com.palantir.atlasdb.transaction.impl.AbstractConditionAwareTransactionManager.runTaskThrowOnConflict(AbstractConditionAwareTransactionManager.java:57)
    at com.palantir.tritium.proxy.InstrumentedTransactionManager$18.runTaskThrowOnConflict(Unknown Source)
    at app//com.palantir.atlasdb.timelock.LockWatchValueIntegrationTest.randomTransactionTask(LockWatchValueIntegrationTest.java:578)
    at java.base@11.0.19/java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:515)
    at java.base@11.0.19/java.util.concurrent.FutureTask.run(FutureTask.java:264)
    at java.base@11.0.19/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
    at java.base@11.0.19/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
    at java.base@11.0.19/java.lang.Thread.run(Thread.java:829)
Caused by: com.palantir.logsafe.exceptions.SafeIllegalStateException: Trying to cache a value which is either locked or is not equal to a currently cached value: {table=default.table, cell=Cell{rowName=617761697473, columnName=636f6c756d6e73}, oldValue=CacheEntry{status=LOCKED, value=com.palantir.atlasdb.keyvalue.api.cache.CacheValue@0}, newValue=CacheEntry{status=UNLOCKED, value=com.palantir.atlasdb.keyvalue.api.cache.CacheValue@0}}
    at app//com.palantir.logsafe.Preconditions.checkState(Preconditions.java:304)
    at app//com.palantir.atlasdb.keyvalue.api.cache.ValueStoreImpl.lambda$putValue$2(ValueStoreImpl.java:98)
    at app//io.vavr.collection.Maps.put(Maps.java:231)
    at app//io.vavr.collection.HashMap.put(HashMap.java:757)
    at app//io.vavr.collection.HashMap.put(HashMap.java:40)
    at app//com.palantir.atlasdb.keyvalue.api.cache.ValueStoreImpl.lambda$putValue$3(ValueStoreImpl.java:97)
    at app//com.palantir.atlasdb.keyvalue.api.cache.StructureHolder.with(StructureHolder.java:36)
    at app//com.palantir.atlasdb.keyvalue.api.cache.ValueStoreImpl.putValue(ValueStoreImpl.java:97)
    at app//com.google.common.collect.RegularImmutableMap.forEach(RegularImmutableMap.java:297)
    at app//com.palantir.atlasdb.keyvalue.api.cache.LockWatchValueScopingCacheImpl$1.invalidateSome(LockWatchValueScopingCacheImpl.java:165)
    at app//com.palantir.atlasdb.keyvalue.api.cache.LockWatchValueScopingCacheImpl$1.invalidateSome(LockWatchValueScopingCacheImpl.java:139)
    at app//com.palantir.lock.watch.CommitUpdate$InvalidateSome.accept(CommitUpdate.java:73)
    at app//com.palantir.atlasdb.keyvalue.api.cache.LockWatchValueScopingCacheImpl.onSuccessfulCommit(LockWatchValueScopingCacheImpl.java:139)
    at jdk.internal.reflect.GeneratedMethodAccessor284.invoke(Unknown Source)
    at java.base@11.0.19/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
    at java.base@11.0.19/java.lang.reflect.Method.invoke(Method.java:566)
    at app//com.palantir.atlasdb.keyvalue.api.ResilientLockWatchProxy.handleInvocation(ResilientLockWatchProxy.java:75)
    ... 20 more