operator-framework / java-operator-sdk

Java SDK for building Kubernetes Operators
https://javaoperatorsdk.io/
Apache License 2.0
795 stars 214 forks source link

Timing issue at start with AbstractEventSource #417

Closed shawkins closed 2 years ago

shawkins commented 3 years ago

This is a minor issue. Event sources can start receiving events before the controller init is called. This requires a null check of the eventHandler or you'll get npe exceptions for a while. Perhaps AbstractEventSource could implement EventHandler and take care of the relevant null checks.

Related to this should the AbstractEventSource.eventHandler field be volatile?

metacosm commented 3 years ago

Is this still going on in the latest version?

shawkins commented 3 years ago

I haven't checked. The event source is an application scoped bean. Events can start flowing as a side effect of a quarkus onStart hook. Should the operator init be guaranteed to happen before that? And/or do you want this as a quarkus specific issue?

lburgazzoli commented 3 years ago

I think now you should override EventSource.start and EventSource.close for a proper lifecycle, so i.e. watching for events starts only after that the event handler has been set.

metacosm commented 3 years ago

Can we close this issue?

shawkins commented 3 years ago

Can we close this issue?

Yes. Based upon our current code, we'll just stick with a null check. We don't want to process any events until all of the informers have started/synced.

metacosm commented 3 years ago

I'll add a null check to AbstractEventSource regardless. Thank you!

shawkins commented 3 years ago

I'll add a null check to AbstractEventSource regardless. Thank you!

Do you mean have AbstractEventSource implement EventHandler and handle the null check in AbstractEventSource.handleEvent?

metacosm commented 3 years ago

That was the plan but maybe it might cause more trouble than worth? Another idea was to potentially buffer the events that are received while the event handler isn't set so that they can be replayed when it is, but I'm not sure that makes sense…

shawkins commented 3 years ago

Our scenario is that we want all of the informers to have synced before any of the events (primary custom resource or the operands are processed) so that we can use the associated Store/Cache and it will be relatively up-to-date. This wasn't straight-forward with the default implementation of informers in the fabric8 client, so we are temporarily using our own while the upstream is refined. So in our case individual starts aren't really meaningful.

Another idea was to potentially buffer the events that are received while the event handler isn't set so that they can be replayed when it is, but I'm not sure that makes sense…

That is not needed in our case - as long as the primary custom resource events get processed after the initial sync, we can simply ignore any of the other events until then.

metacosm commented 3 years ago

Note that the quarkus extension allows the registration of the controller (and hence related event sources) to be delayed until a specific CDI event is received. See https://github.com/quarkiverse/quarkus-operator-sdk/blob/master/runtime/src/main/java/io/quarkiverse/operatorsdk/runtime/DelayRegistrationUntil.java, https://github.com/quarkiverse/quarkus-operator-sdk/blob/master/integration-tests/src/main/java/io/quarkiverse/operatorsdk/it/TestController.java and https://github.com/quarkiverse/quarkus-operator-sdk/blob/master/integration-tests/src/main/java/io/quarkiverse/operatorsdk/it/OperatorSDKResource.java#L31-L38 for more details.

shawkins commented 3 years ago

Thanks, once our upgrade to quarkus 1.13.1 / operator sdk 1.8.2 moves forward I'll make use of that.

shawkins commented 3 years ago

@metacosm I'm trying this out now with 1.8.2 and it doesn't appear to work. DelayRegistrationUntil has a source retention policy. At least when I try to debug what is happening, I see that when OperatorSDKProcessor.createControllerConfiguration is looking for the annotation it can't find it.

jmrodri commented 3 years ago

/assign @jmrodri

csviri commented 2 years ago

Will close this, since we implemented the start/stop (close) methods meanwhile.