operator-framework / java-operator-sdk

Java SDK for building Kubernetes Operators
https://javaoperatorsdk.io/
Apache License 2.0
807 stars 216 forks source link

ConcurrentModificationException inside getSecondaryResources #2567

Closed matteriben closed 1 week ago

matteriben commented 2 weeks ago

Bug Report

What did you do?

Called context.getSecondaryResources(resource) inside ResourceDiscriminator.

What did you expect to see?

Secondary resources.

What did you see instead? Under which circumstances?

Not sure the exact circumstances, but saw:

java.util.ConcurrentModificationException
    at java.base/java.util.HashMap$ValueSpliterator.forEachRemaining(HashMap.java:1792)
    at java.base/java.util.stream.AbstractPipeline.copyInto(AbstractPipeline.java:509)
    at java.base/java.util.stream.AbstractPipeline.wrapAndCopyInto(AbstractPipeline.java:499)
    at java.base/java.util.stream.ReduceOps$ReduceOp.evaluateSequential(ReduceOps.java:921)
    at java.base/java.util.stream.AbstractPipeline.evaluate(AbstractPipeline.java:234)
    at java.base/java.util.stream.ReferencePipeline.collect(ReferencePipeline.java:682)
    at io.javaoperatorsdk.operator.processing.event.EventSources.getEventSources(EventSources.java:184)
    at io.javaoperatorsdk.operator.processing.event.EventSourceManager.getResourceEventSourcesFor(EventSourceManager.java:228)
    at io.javaoperatorsdk.operator.api.reconciler.DefaultContext.getSecondaryResourcesAsStream(DefaultContext.java:50)
    at io.javaoperatorsdk.operator.api.reconciler.DefaultContext.getSecondaryResources(DefaultContext.java:40)

Environment

Kubernetes cluster type:

testing with kind via quarkus dev services.

io.quarkiverse.operatorsdk:quarkus-operator-sdk:6.7.2 io.javaoperatorsdk:operator-framework-core:4.9.2

Eclipse Temurin OpenJDK Runtime Environment 21.0.4+7-LTS

Possible Solution

Additional context

metacosm commented 2 weeks ago

Unfortunately, baring a JDK bug, I don't see anything in the code that would cause this problem since these operations are read-only. Let us know if you manage to replicate the issue with some additional context, please.

matteriben commented 2 weeks ago

I'm not sure I can replicate it easily. But I think I can easily demonstrate the underlying issue.

    @Test
    void testConcurrency() throws InterruptedException {
        AtomicBoolean done = new AtomicBoolean();
        while (!done.get()) {
            Phaser phaser = new Phaser(2);
            Map<Object, Object> map = new HashMap<>();
            for (int i = 0; i < 10; i++) {
                map.put(i, i);
            }
            Stream<Object> stream = map.values().stream();
            Thread thread1 = new Thread(() -> {
                phaser.arriveAndAwaitAdvance();
                for (int i = 10; i < 20; i++) {
                    map.put(i, i);
                }
            });
            Thread thread2 = new Thread(() -> {
                phaser.arriveAndAwaitAdvance();
                stream.collect(Collectors.toList());
            });
            thread2.setUncaughtExceptionHandler((t, e) -> {
                if (e instanceof ConcurrentModificationException) {
                    e.printStackTrace(System.out);
                    done.set(true);
                }
            });
            thread1.start();
            thread2.start();
            thread1.join();
            thread2.join();
        }
    }
java.util.ConcurrentModificationException
    at java.base/java.util.HashMap$ValueSpliterator.forEachRemaining(HashMap.java:1792)
    at java.base/java.util.stream.AbstractPipeline.copyInto(AbstractPipeline.java:509)
    at java.base/java.util.stream.AbstractPipeline.wrapAndCopyInto(AbstractPipeline.java:499)
    at java.base/java.util.stream.ReduceOps$ReduceOp.evaluateSequential(ReduceOps.java:921)
    at java.base/java.util.stream.AbstractPipeline.evaluate(AbstractPipeline.java:234)
    at java.base/java.util.stream.ReferencePipeline.collect(ReferencePipeline.java:682)
metacosm commented 2 weeks ago

OK so the issue would happen when your ResourceDiscriminator would be called while other event sources are still being initialized? If you are initializing your event sources manually, you could try to set the event source's priority to EventSourceStartPriority.RESOURCE_STATE_LOADER to see if this addresses the issue (you can see an example of this being used at https://github.com/operator-framework/java-operator-sdk/blob/main/operator-framework/src/test/java/io/javaoperatorsdk/operator/sample/externalstate/ExternalStateReconciler.java#L107). If you're not setting up sources manually, could you detail how they are set up, please? Maybe we could (should?) surface that priority setting?

matteriben commented 2 weeks ago

This reconciler is "managed", in the sense that dependent resources are added through annotations. No event sources for this reconciler are initialized manually.

This reconciler does however have six dependent resources with activation conditions, including the dependent resource whose ResourceDiscriminator was called when this exception was thrown. I could be wrong, but I suspect this may be related to the activation conditions.

AbstractWorkflowExecutor::registerOrDeregisterEventSourceBasedOnActivation

EventSourceManager::dynamicallyRegisterEventSource

EventSourceManager::registerEventSource

EventSources::add

csviri commented 2 weeks ago

Yes, event sources are added dynamically with activation condition, this will be the issue

csviri commented 1 week ago

Added fix for both main and v5, pls take a look if it makes sense.