radixdlt / olympia-node

Radix monorepo
Other
142 stars 35 forks source link

ConcurrentModificationException in PeerManager #507

Closed stuartbain closed 3 years ago

stuartbain commented 3 years ago

Validator node (version 1.0.0) crashed with following stack trace:

java.util.ConcurrentModificationException
        at java.base/java.util.HashMap$KeySpliterator.forEachRemaining(HashMap.java:1608)
        at java.base/java.util.stream.ReferencePipeline$Head.forEach(ReferencePipeline.java:658)
        at java.base/java.util.stream.ReferencePipeline$7$1.accept(ReferencePipeline.java:274)
        at java.base/java.util.concurrent.ConcurrentHashMap$ValueSpliterator.forEachRemaining(ConcurrentHashMap.java:3605)
        at java.base/java.util.stream.AbstractPipeline.copyInto(AbstractPipeline.java:484)
        at java.base/java.util.stream.AbstractPipeline.wrapAndCopyInto(AbstractPipeline.java:474)
        at java.base/java.util.stream.ReduceOps$ReduceOp.evaluateSequential(ReduceOps.java:913)
        at java.base/java.util.stream.AbstractPipeline.evaluate(AbstractPipeline.java:234)
        at java.base/java.util.stream.ReferencePipeline.collect(ReferencePipeline.java:578)
        at com.radixdlt.network.p2p.PeerManager.activeChannels(PeerManager.java:305)
        at com.radixdlt.network.p2p.PeerManagerPeersView.peers(PeerManagerPeersView.java:88)
        at com.radixdlt.sync.LocalSyncService.choosePeersForSyncCheck(LocalSyncService.java:265)
        at com.radixdlt.sync.LocalSyncService.initSyncCheck(LocalSyncService.java:251)
        at com.radixdlt.sync.LocalSyncService.lambda$new$0(LocalSyncService.java:189)
        at com.radixdlt.sync.LocalSyncService$Handler.handle(LocalSyncService.java:624)
        at com.radixdlt.sync.LocalSyncService.processEvent(LocalSyncService.java:581)
        at com.radixdlt.sync.LocalSyncService.lambda$syncCheckTriggerEventProcessor$41(LocalSyncService.java:541)
        at com.radixdlt.epochs.EpochsLocalSyncService.processSyncCheckTrigger(EpochsLocalSyncService.java:151)
        at io.reactivex.rxjava3.internal.observers.LambdaObserver.onNext(LambdaObserver.java:63)
        at io.reactivex.rxjava3.internal.operators.observable.ObservableObserveOn$ObserveOnObserver.drainNormal(ObservableObserveOn.java:201)
        at io.reactivex.rxjava3.internal.operators.observable.ObservableObserveOn$ObserveOnObserver.run(ObservableObserveOn.java:255)
        at io.reactivex.rxjava3.internal.schedulers.ExecutorScheduler$ExecutorWorker$BooleanRunnable.run(ExecutorScheduler.java:322)
        at io.reactivex.rxjava3.internal.schedulers.ExecutorScheduler$ExecutorWorker.runEager(ExecutorScheduler.java:287)
        at io.reactivex.rxjava3.internal.schedulers.ExecutorScheduler$ExecutorWorker.run(ExecutorScheduler.java:248)
        at java.base/java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:515)
        at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:264)
        at java.base/java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:304)
        at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
        at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
        at java.base/java.lang.Thread.run(Thread.java:829)
dcernahoschi commented 3 years ago

The issue seems to be here in the code:

https://github.com/radixdlt/radixdlt/blob/e943591e86517cd185d9f040e1aeb94dca34262e/radixdlt-core/radixdlt/src/main/java/com/radixdlt/network/p2p/PeerManager.java#L305

While both activeChannels map and the resulting immutable set are thread safe, the collector that builds the immutable set is not. And there is only one instance of this collector in guava library

I've opened a pull request for this: https://github.com/radixdlt/radixdlt/pull/515

LukasGasior1 commented 3 years ago

Fixed in: https://github.com/radixdlt/radixdlt/pull/469