operator-framework / java-operator-sdk

Java SDK for building Kubernetes Operators
https://javaoperatorsdk.io/
Apache License 2.0
795 stars 214 forks source link

AbstractDependentResource should be able to pass his turn #1105

Closed scrocquesel closed 2 years ago

scrocquesel commented 2 years ago

Actually, when the secret is not yet created, the mysql schema sample is throwing some errors in SchemaDependentResource due to orElseThrow.

https://github.com/java-operator-sdk/java-operator-sdk/blob/2eac098db872cedcefecb6036ec98c77b58cf60f/sample-operators/mysql-schema/src/main/java/io/javaoperatorsdk/operator/sample/dependent/SchemaDependentResource.java#L56

The same apply with the reconciler

https://github.com/java-operator-sdk/java-operator-sdk/blob/2eac098db872cedcefecb6036ec98c77b58cf60f/sample-operators/mysql-schema/src/main/java/io/javaoperatorsdk/operator/sample/MySQLSchemaReconciler.java#L39

I'm trying to avoid these exceptions because there are not really exceptions, but just the fact that some requirement are not yet satisfied. So I tried,

@Override
    public Schema desired(MySQLSchema primary, Context<MySQLSchema> context) {
        return context.getSecondaryResource(Secret.class)
                .map(secret -> new Schema(primary.getMetadata().getName(), primary.getSpec().getEncoding()))
                .orElse(null);
    }

But AbstractDependentResource.logForOperation produces an NPE and even if it doesn't, I think that create should not be call.

It is not clear what to do in such a case. I don't feel like exception are the way to go because it raise a call to updateErrorStatus when it should definitely not. Should desired return an Optional<R> ?

metacosm commented 2 years ago

I wanted to have an ERROR case in ReconcileResult but we decided that this was too much of a behavioral change at this point. That said, I agree with you, right now, we're using exceptions for behavior / control and it's a bad design, imo.

csviri commented 2 years ago

How can it happen that the secret is not created (I mean other than an error state?). The secret is a managed dependent resource that is created before the schema. Managed dependent resources are created in order.

It's true that lately the behavior changed, that we reconcile all the resources event if the previous ones failed. This makes issues when the first operation fails, but only if the resources are not independent.

In this case a fail fast approach would be better in the current scope. So not reconcile resources if one fails before.

csviri commented 2 years ago

I wanted to have an ERROR case in ReconcileResult but we decided that this was too much of a behavioral change at this point. That said, I agree with you, right now, we're using exceptions for behavior / control and it's a bad design, imo.

The ERROR does not fit in the big picture now, could you create a separate issue for that to discuss the details?

metacosm commented 2 years ago

This is the issue, right here, I think :)

csviri commented 2 years ago

I'm trying to avoid these exceptions because there are not really exceptions, but just the fact that some requirement are not yet satisfied. So I tried,

Again, what are the reason that are not satisfied yet in your case? @scrocquesel So why is the secret not created.

csviri commented 2 years ago

This is the issue, right here, I think :)

Maybe with a nice description, definition of the problem in general..

csviri commented 2 years ago

TBH I would revert this into a fail fast behavior, that would solve this problem, and would be much more easier to follow what happening from the logs.

metacosm commented 2 years ago

This is the issue, right here, I think :)

Maybe with a nice description, definition of the problem in general..

The problem is as explained in the description of the issue: people want to be able to make decisions based on the reconcile result of dependents and right now, the reconcile result doesn't cover all the possibilities. This should be part of a more general approach, of course, but that's the core of the issue. It might just be that the dependent is not yet ready, for example. It shouldn't be an error that triggers a retry because this might cause retries to get exhausted before your dependent ever gets ready, so managing this situation via exceptions is not correct, imo.

csviri commented 2 years ago

The problem is as explained in the description of the issue: people want to be able to make decisions based on the reconcile result of dependents and right now, the reconcile result doesn't cover all the possibilities. This should be part of a more general approach, of course, but that's the core of the issue. It might just be that the dependent is not yet ready, for example. It shouldn't be an error that triggers a retry because this might cause retries to get exhausted before your dependent ever gets ready, so managing this situation via exceptions is not correct, imo.

  1. That is a completely different/additional feature. Like waiting for something to be ready is completely out of scope, ( should be managed inside the reconcile() if that needs to be synchronouse?). This now can be handled with standalone mode btw (cc @scrocquesel ). But Secret has no concept of ready anyways. Also we agreed on a different thread that waiting synchronously even for a Deployment does not make sense in general.

  2. What is in this issue is basically error handling. But as we discussed before, this will be part of a bigger picture, declaratively describe workflows. Ideally imo this should be discussed also taking in mind that. see: https://github.com/java-operator-sdk/java-operator-sdk/issues/1097

  3. Handling an explicit error state would not be "seamless" integration into the current exception handling model. What is based on fail fast approach normally a generic error handler (analogy to try-catch block), see ErrorStatusHandler. It completely changes the behavior and forces users that even for managed resources always check all the ReconcileResult. I'm fine to postpone the release, but this needs to be thinked though, samples made, etc. Because that is completely different behavior and not seamless / drop in replacement at all as in original design.

  4. Again the fail fast approach is I think right in this case, if there is a problem with the Secret reconciliation, it should not reconcile further but retry. Since other resources depend on it. This was changed last week, we should rather revert this behavior, because actually makes issues like this.

csviri commented 2 years ago

people want to be able to make decisions based on the reconcile result of dependents and right now, the reconcile result doesn't cover all the possibilities

There is no decision to make in this case. It is basically should be just retries if there is an issue with secret creation. Can you describe actual use case when people want to make decision bases on error? and how would that look like?

So in this case the actual thing we want to do just retry if the secret creation fails, don't go further. Event if there is an error status in ReconcileResult not sure how that would make the result behavior different at the and.

metacosm commented 2 years ago

Note that all this discussion is out of scope for the next release but it's still interesting to talk about this problem now.

metacosm commented 2 years ago

people want to be able to make decisions based on the reconcile result of dependents and right now, the reconcile result doesn't cover all the possibilities

There is no decision to make in this case. It is basically should be just retries if there is an issue with secret creation. Can you describe actual use case when people want to make decision bases on error? and how would that look like?

The problem is that the sample currently throws an exception when the secret is not present in the context. When it's not present, it's usually not because we couldn't create it but rather because the associated schema is not ready. Handling this use case via error handling is wrong because it's not an error condition and that might lead to retries being exhausted despite the fact that nothing is actually wrong with the system. To be clear, I'm not saying that this should be fixed for the current scope but throwing an exception in this situation is not the proper way to deal with this because this is not an error, just part of the reconciliation process. Note that this problem wouldn't be solved with an error case on ReconcileResult either, though, because this isn't an error. The problem is that ReconcileResult is currently under-specified and doesn't cover all the possible cases.

csviri commented 2 years ago

The problem is that the sample currently throws an exception when the secret is not present in the context. When it's not present, it's usually not because we couldn't create it but rather because the associated schema is not ready

Will take a look, why it might happen, but if the secret if created and it's not in a context is simply a bug withing the current implementation. Will take a look on this because this actually should not happen, since we putting the resource into the event source if we created it)

However @scrocquesel clearly saying in the issue:

Actually, when the secret is not yet created, the mysql schema sample is throwing some errors

csviri commented 2 years ago

@scrocquesel could you please describe more in detail the actual situation what is happening regarding the secret, so it is really not created at that point, or how is this happening?

jmrodri commented 2 years ago

Relates to #1107

scrocquesel commented 2 years ago

Actually, if SecretDependentResource create the secret in the reconcile, it doesn't not update its cache. So when in the same loop, the SchemaDependentResource call context.getSecondaryResource(Secret.class).orElseThrow(); it throws an exception.

This happens more in the equivalent Quarkus test because Quarkus wait for the database to be ready. In the java one, SchemaDependentResource actually fails to connect to the database at the begining of the test. The secret is created and set in the cache long before, it will actually tries to create the database and tries to fetch the secret from the context.

It may be a bug or something to improve but still, it shows up that there is time when a dependent is not able to participate in the reconcile loop. To some extent, this is certainly related to what workflow will address in the future.

Meanwhile, it is not clear how it can be addressed and the NPE when logging a null desired returned value should be fixed somehow.

There is also an another use case which can be related. Think of a flag in a primary resource to create or delete a secondary resource (or activate/deactivate a feature somewhere). It means that on update, you should be able to handle a null desired state (which is not possible due to NPE in logForOperation. Then, desired() will return null and we back in the above use case where the operator should not tries to call create because the dependent do not want to create the secondary resource.

csviri commented 2 years ago

Thank you for response @scrocquesel !

Actually, if SecretDependentResource create the secret in the reconcile, it doesn't not update its cache. So when in the same loop, the SchemaDependentResource call context.getSecondaryResource(Secret.class).orElseThrow(); it throws an exception.

This sounds like a bug, I will take a look today. (Although we have a Integration test that never failed )

There is also an another use case which can be related. Think of a flag in a primary resource to create or delete a secondary resource (or activate/deactivate a feature somewhere). It means that on update, you should be able to handle a null desired state (which is not possible due to NPE in logForOperation. Then, desired() will return null and we back in the above use case where the operator should not tries to call create because the dependent do not want to create the secondary resource.

In this case the reconcile should not be called. See: https://github.com/java-operator-sdk/java-operator-sdk/blob/4301fe3e2b1e2d49753e4876854f8084680f1234/sample-operators/webpage/src/main/java/io/javaoperatorsdk/operator/sample/WebPageStandaloneDependentsReconciler.java#L68-L72

There is a separate delete for this. It should be used like this IMO. Managed dependent resources simply don't support this use case now.

It may be a bug or something to improve but still, it shows up that there is time when a dependent is not able to participate in the reconcile loop. To some extent, this is certainly related to what workflow will address in the future.

My opinion is the same as mentioned before. If its not "not able to participate in the reconcile loop" then the reconcile should not be called. So let's say we fix the bug that the resources is properly available in the context. We have 3 options:

  1. fail fast - this is I would prefer.
  2. Throw an exception if the resources we depend on not present as now.
  3. change the sample to standalone dependent resources (since the dependents are not independent of each other, so by definition this not works well in case of an error)

But we should not even really solve the case if desired state is null/optional - this seems to me a hack. That problem should be done on different level. And basically we should not call reconcile() if does not make sense.

We can take a look on the current managed dependent resources as a simple linear workflow, and IMO if the workflow breaks because of an exception we should simply not proceed, just retry the reconciliation (we some delay). This will be the logic probably in more complex workflows too.

csviri commented 2 years ago

Just to make sure @scrocquesel are you sure you are using the up to date version from the main?

It would be great if you could provide some logs. This does not seems to fail on our github actions runs at all.

metacosm commented 2 years ago

I think that this and other use cases should be handled by more complete ReconcileResult in future versions so that DependentResource implementations can tell the main reconciler what the intent is so that it could make (possibly automated by a simple, generic workflow engine) decisions. Right now, the result is a little too binary to handle many useful cases.

scrocquesel commented 2 years ago

Yes, there IS a draft PR showing this issue.

csviri commented 2 years ago

I checked the code and there is an integration test that tests the creation / access, see:

https://github.com/java-operator-sdk/java-operator-sdk/blob/50f28d971283b7169967ab5780a16f5a938691a1/operator-framework/src/test/java/io/javaoperatorsdk/operator/sample/standalonedependent/StandaloneDependentTestReconciler.java#L50-L54

I wonder if this can be false negative. But so this never happened that it failed. @scrocquesel pls try to provide the logs, when the getSecondaryResource not returns the resource and this fails. Otherwise would be very hard to find the root cause atm.

csviri commented 2 years ago

I think found the issue, at least one issue :)

https://github.com/java-operator-sdk/java-operator-sdk/blob/50f28d971283b7169967ab5780a16f5a938691a1/operator-framework-core/src/main/java/io/javaoperatorsdk/operator/processing/Controller.java#L193-L193

Lately the managed dependent were resources changed to have names, so it is stored in a HashMap. But that does not guarantee any order. So it is really works only if the resource are independent. Porbably it is just accidentally right in this sample. IMO MySQLSchema sample should be migrated to standalone mode, since does not really fits therefore to this use case.

(Or we should implement some ordering guarantees at least. No strong opinion on this.)

cc @metacosm

metacosm commented 2 years ago

I think it's better to implement ordering for now since that's how managed dependent resources were supposed to work initially (with the understanding that this ordering is pretty weak and that we would probably implement stronger ordering in future releases).

scrocquesel commented 2 years ago

I think found the issue, at least one issue :)

https://github.com/java-operator-sdk/java-operator-sdk/blob/50f28d971283b7169967ab5780a16f5a938691a1/operator-framework-core/src/main/java/io/javaoperatorsdk/operator/processing/Controller.java#L193-L193

Lately the managed dependent were resources changed to have names, so it is stored in a HashMap. But that does not guarantee any order. So it is really works only if the resource are independent. Porbably it is just accidentally right in this sample. IMO MySQLSchema sample should be migrated to standalone mode, since does not really fits therefore to this use case.

(Or we should implement some ordering guarantees at least. No strong opinion on this.)

cc @metacosm

That's right, I actually thought the order of declaration in annotation was used to call the dependent and the hashmap doesn't guarantee it anymore. Nevertheless, it happens for me that for mysql-schema the order of call is respected for all test runs.

What is important in your comment is that actually if two dependents are interdependent (one require the other for eg), then we should better use standalone mode for now. I agree the sample actually is misleading as I thought, it should have some very simple logic right now to handle a simple chain logic, even if it implies more code in the dependent resource.

scrocquesel commented 2 years ago

I checked the code and there is an integration test that tests the creation / access, see:

https://github.com/java-operator-sdk/java-operator-sdk/blob/50f28d971283b7169967ab5780a16f5a938691a1/operator-framework/src/test/java/io/javaoperatorsdk/operator/sample/standalonedependent/StandaloneDependentTestReconciler.java#L50-L54

I wonder if this can be false negative. But so this never happened that it failed. @scrocquesel pls try to provide the logs, when the getSecondaryResource not returns the resource and this fails. Otherwise would be very hard to find the root cause atm.

19:23:25.584 INFO  Controller.java               (305) Starting 'mysqlschemareconciler' controller for reconciler: io.javaoperatorsdk.operator.sample.MySQLSchemaReconciler, resource: io.javaoperatorsdk.operator.sample.MySQLSchema
19:23:25.588 DEBUG EventSourceManager.java       (91) Starting event source io.javaoperatorsdk.operator.sample.dependent.SecretDependentResource
19:23:25.588 DEBUG DefaultSharedIndexInformer.java(142) informer: ready to run resync and reflector for class io.fabric8.kubernetes.api.model.Secret with resync 0
19:23:25.588 DEBUG DefaultSharedIndexInformer.java(212) informer#Controller: resync skipped due to 0 full resync period class io.fabric8.kubernetes.api.model.Secret
19:23:25.615 DEBUG InformerEventSource.java      (112) Propagating event for add, resource with same version not result of a reconciliation. Resource ID: ResourceID{name='default-token-4qsdv', namespace='mysql'}
19:23:25.615 DEBUG InformerEventSource.java      (112) Propagating event for add, resource with same version not result of a reconciliation. Resource ID: ResourceID{name='default-token-krr4c', namespace='mysqlschemaoperatore2e-test-06099adb-7214-4d23-a76a-42122bb9aa8'}
19:23:25.615 DEBUG Reflector.java                (95) Listing items (43) for resource class io.fabric8.kubernetes.api.model.Secret v2111938
19:23:25.615 DEBUG Reflector.java                (103) Starting watcher for resource class io.fabric8.kubernetes.api.model.Secret v2111938
19:23:25.616 DEBUG AbstractWatchManager.java     (222) Watching https://kubernetes.docker.internal:6443/api/v1/secrets?resourceVersion=2111938&allowWatchBookmarks=true&watch=true...
19:23:25.631 DEBUG WatcherWebSocketListener.java (43) WebSocket successfully opened
19:23:25.631 DEBUG EventSourceManager.java       (91) Started event source io.javaoperatorsdk.operator.sample.dependent.SecretDependentResource
19:23:25.631 DEBUG EventSourceManager.java       (91) Starting event source io.javaoperatorsdk.operator.processing.event.source.timer.TimerEventSource#167146668
19:23:25.631 DEBUG EventSourceManager.java       (91) Started event source io.javaoperatorsdk.operator.processing.event.source.timer.TimerEventSource#167146668
19:23:25.632 DEBUG EventSourceManager.java       (91) Starting event source io.javaoperatorsdk.operator.processing.event.source.controller.ControllerResourceEventSource#600554759
19:23:25.632 DEBUG DefaultSharedIndexInformer.java(142) informer: ready to run resync and reflector for class io.javaoperatorsdk.operator.sample.MySQLSchema with resync 0
19:23:25.632 DEBUG DefaultSharedIndexInformer.java(212) informer#Controller: resync skipped due to 0 full resync period class io.javaoperatorsdk.operator.sample.MySQLSchema
19:23:25.638 DEBUG Reflector.java                (95) Listing items (0) for resource class io.javaoperatorsdk.operator.sample.MySQLSchema v2111938
19:23:25.638 DEBUG Reflector.java                (103) Starting watcher for resource class io.javaoperatorsdk.operator.sample.MySQLSchema v2111938
19:23:25.639 DEBUG AbstractWatchManager.java     (222) Watching https://kubernetes.docker.internal:6443/apis/mysql.sample.javaoperatorsdk/v1/namespaces/mysqlschemaoperatore2e-test-06099adb-7214-4d23-a76a-42122bb9aa8/mysqlschemas?resourceVersion=2111938&allowWatchBookmarks=true&watch=true...
19:23:25.661 DEBUG EventSourceManager.java       (91) Started event source io.javaoperatorsdk.operator.processing.event.source.controller.ControllerResourceEventSource#600554759
19:23:25.661 DEBUG WatcherWebSocketListener.java (43) WebSocket successfully opened
19:23:25.661 DEBUG EventSourceManager.java       (91) Starting event source schema
19:23:25.661 DEBUG EventSourceManager.java       (91) Started event source schema
19:23:25.662 INFO  Controller.java               (333) 'mysqlschemareconciler' controller started, pending event sources initialization
19:23:25.670 INFO  MySQLSchemaOperatorE2E.java   (112) Creating test MySQLSchema object: CustomResource{kind='MySQLSchema', apiVersion='mysql.sample.javaoperatorsdk/v1', metadata=ObjectMeta(annotations=null, clusterName=null, creationTimestamp=null, deletionGracePeriodSeconds=null, deletionTimestamp=null, finalizers=[], generateName=null, generation=null, labels=null, managedFields=[], name=mydb1, namespace=mysqlschemaoperatore2e-test-06099adb-7214-4d23-a76a-42122bb9aa8, ownerReferences=[], resourceVersion=null, selfLink=null, uid=null, additionalProperties={}), spec=io.javaoperatorsdk.operator.sample.SchemaSpec@4087c7fc, status=null}
19:23:25.696 DEBUG Reflector.java                (139) Event received ADDED MySQLSchema resourceVersion 2111939
19:23:25.696 INFO  MySQLSchemaOperatorE2E.java   (115) Waiting 2 minutes for expected resources to be created and updated
19:23:25.697 DEBUG ControllerResourceEventSource.java(60) Event received for resource: mydb1
19:23:25.701 DEBUG EventProcessor.java           (109) Received event: ResourceEvent{action=ADDED, associated resource id=ResourceID{name='mydb1', namespace='mysqlschemaoperatore2e-test-06099adb-7214-4d23-a76a-42122bb9aa8'}}
19:23:25.702 DEBUG EventProcessor.java           (169) Marking event received for: ResourceID{name='mydb1', namespace='mysqlschemaoperatore2e-test-06099adb-7214-4d23-a76a-42122bb9aa8'}
19:23:25.717 DEBUG EventProcessor.java           (146) Executing events for custom resource. Scope: ExecutionScope{ resource id: ResourceID{name='mydb1', namespace='mysqlschemaoperatore2e-test-06099adb-7214-4d23-a76a-42122bb9aa8'}, version: 2111939}
19:23:25.718 DEBUG ReconciliationDispatcher.java (59) Handling dispatch for resource mydb1
19:23:25.720 DEBUG ReconciliationDispatcher.java (281) Adding finalizer for resource: 5bcee73a-800a-4528-b6da-0e93f6bed285 version: 2111939
19:23:25.720 DEBUG ReconciliationDispatcher.java (303) Trying to replace resource mydb1, version: 2111939
19:23:25.731 DEBUG Reflector.java                (139) Event received MODIFIED MySQLSchema resourceVersion 2111940
19:23:25.731 DEBUG ControllerResourceEventSource.java(60) Event received for resource: mydb1
19:23:25.731 DEBUG EventProcessor.java           (186) Event processing finished. Scope: ExecutionScope{ resource id: ResourceID{name='mydb1', namespace='mysqlschemaoperatore2e-test-06099adb-7214-4d23-a76a-42122bb9aa8'}, version: 2111939}, PostExecutionControl: PostExecutionControl{onlyFinalizerHandled=true, updatedCustomResource=null, runtimeException=null}
19:23:25.731 DEBUG EventProcessor.java           (271) Cleanup for successful execution for resource: mydb1
19:23:25.732 DEBUG EventProcessor.java           (109) Received event: ResourceEvent{action=UPDATED, associated resource id=ResourceID{name='mydb1', namespace='mysqlschemaoperatore2e-test-06099adb-7214-4d23-a76a-42122bb9aa8'}}
19:23:25.732 DEBUG EventProcessor.java           (169) Marking event received for: ResourceID{name='mydb1', namespace='mysqlschemaoperatore2e-test-06099adb-7214-4d23-a76a-42122bb9aa8'}
19:23:25.733 DEBUG EventProcessor.java           (146) Executing events for custom resource. Scope: ExecutionScope{ resource id: ResourceID{name='mydb1', namespace='mysqlschemaoperatore2e-test-06099adb-7214-4d23-a76a-42122bb9aa8'}, version: 2111940}
19:23:25.734 DEBUG ReconciliationDispatcher.java (59) Handling dispatch for resource mydb1
19:23:25.736 DEBUG ReconciliationDispatcher.java (127) Reconciling resource mydb1 with version: 2111940 with execution scope: ExecutionScope{ resource id: ResourceID{name='mydb1', namespace='mysqlschemaoperatore2e-test-06099adb-7214-4d23-a76a-42122bb9aa8'}, version: 2111940}
19:23:25.740 DEBUG SchemaDependentResource.java  (71) Connecting to 'jdbc:mysql://127.0.0.1:3307' with user 'root'
19:23:25.770 DEBUG PortForwarderWebsocket.java   (187) FWD: onOpen
19:23:25.787 INFO  AbstractDependentResource.java(67) Creating Schema for primary ResourceID{name='mydb1', namespace='mysqlschemaoperatore2e-test-06099adb-7214-4d23-a76a-42122bb9aa8'}
19:23:25.788 DEBUG AbstractDependentResource.java(68) Creating dependent io.javaoperatorsdk.operator.sample.schema.Schema@c10f35cb for primary CustomResource{kind='MySQLSchema', apiVersion='mysql.sample.javaoperatorsdk/v1', metadata=ObjectMeta(annotations=null, clusterName=null, creationTimestamp=2022-04-01T17:23:25Z, deletionGracePeriodSeconds=null, deletionTimestamp=null, finalizers=[mysqlschemas.mysql.sample.javaoperatorsdk/finalizer], generateName=null, generation=1, labels=null, managedFields=[ManagedFieldsEntry(apiVersion=mysql.sample.javaoperatorsdk/v1, fieldsType=FieldsV1, fieldsV1=FieldsV1(additionalProperties={f:metadata={f:finalizers={.={}, v:"mysqlschemas.mysql.sample.javaoperatorsdk/finalizer"={}}}, f:spec={.={}, f:encoding={}}}), manager=okhttp, operation=Update, subresource=null, time=2022-04-01T17:23:25Z, additionalProperties={})], name=mydb1, namespace=mysqlschemaoperatore2e-test-06099adb-7214-4d23-a76a-42122bb9aa8, ownerReferences=[], resourceVersion=2111940, selfLink=null, uid=5bcee73a-800a-4528-b6da-0e93f6bed285, additionalProperties={}), spec=io.javaoperatorsdk.operator.sample.SchemaSpec@19a5592d, status=null}
19:23:25.788 DEBUG SchemaDependentResource.java  (71) Connecting to 'jdbc:mysql://127.0.0.1:3307' with user 'root'
19:23:25.819 DEBUG PortForwarderWebsocket.java   (187) FWD: onOpen
19:23:25.836 INFO  AbstractDependentResource.java(67) Creating 'mydb1-secret' Secret for primary ResourceID{name='mydb1', namespace='mysqlschemaoperatore2e-test-06099adb-7214-4d23-a76a-42122bb9aa8'}
19:23:25.837 DEBUG AbstractDependentResource.java(68) Creating dependent Secret(apiVersion=v1, data={mysql.secret.user.name=bXlkYjEtdXNlcg==, mysql.secret.user.password=OE1pdFQ4QjJwMldjM3Y3WQ==}, immutable=null, kind=Secret, metadata=ObjectMeta(annotations=null, clusterName=null, creationTimestamp=null, deletionGracePeriodSeconds=null, deletionTimestamp=null, finalizers=[], generateName=null, generation=null, labels=null, managedFields=[], name=mydb1-secret, namespace=mysqlschemaoperatore2e-test-06099adb-7214-4d23-a76a-42122bb9aa8, ownerReferences=[], resourceVersion=null, selfLink=null, uid=null, additionalProperties={}), stringData=null, type=null, additionalProperties={}) for primary CustomResource{kind='MySQLSchema', apiVersion='mysql.sample.javaoperatorsdk/v1', metadata=ObjectMeta(annotations=null, clusterName=null, creationTimestamp=2022-04-01T17:23:25Z, deletionGracePeriodSeconds=null, deletionTimestamp=null, finalizers=[mysqlschemas.mysql.sample.javaoperatorsdk/finalizer], generateName=null, generation=1, labels=null, managedFields=[ManagedFieldsEntry(apiVersion=mysql.sample.javaoperatorsdk/v1, fieldsType=FieldsV1, fieldsV1=FieldsV1(additionalProperties={f:metadata={f:finalizers={.={}, v:"mysqlschemas.mysql.sample.javaoperatorsdk/finalizer"={}}}, f:spec={.={}, f:encoding={}}}), manager=okhttp, operation=Update, subresource=null, time=2022-04-01T17:23:25Z, additionalProperties={})], name=mydb1, namespace=mysqlschemaoperatore2e-test-06099adb-7214-4d23-a76a-42122bb9aa8, ownerReferences=[], resourceVersion=2111940, selfLink=null, uid=5bcee73a-800a-4528-b6da-0e93f6bed285, additionalProperties={}), spec=io.javaoperatorsdk.operator.sample.SchemaSpec@19a5592d, status=null}
19:23:25.838 DEBUG InformerEventSource.java      (226) Starting event recording for: ResourceID{name='mydb1', namespace='mysqlschemaoperatore2e-test-06099adb-7214-4d23-a76a-42122bb9aa8'}
19:23:25.838 DEBUG KubernetesDependentResource.java(108) Creating target resource with type: class io.fabric8.kubernetes.api.model.Secret, with id: ResourceID{name='mydb1-secret', namespace='mysqlschemaoperatore2e-test-06099adb-7214-4d23-a76a-42122bb9aa8'}
19:23:25.847 DEBUG Reflector.java                (139) Event received ADDED Secret resourceVersion 2111941
19:23:25.853 DEBUG TemporaryResourceCache.java   (58) Won't put resource into cache found already informer cache: ResourceID{name='mydb1-secret', namespace='mysqlschemaoperatore2e-test-06099adb-7214-4d23-a76a-42122bb9aa8'}
19:23:25.853 DEBUG InformerEventSource.java      (112) Propagating event for add, resource with same version not result of a reconciliation. Resource ID: ResourceID{name='mydb1-secret', namespace='mysqlschemaoperatore2e-test-06099adb-7214-4d23-a76a-42122bb9aa8'}
19:23:25.854 INFO  Controller.java               (197) Reconciled dependent 'io.javaoperatorsdk.operator.sample.dependent.SecretDependentResource' -> CREATED
19:23:25.854 DEBUG EventProcessor.java           (109) Received event: Event{relatedCustomResource=ResourceID{name='mydb1', namespace='mysqlschemaoperatore2e-test-06099adb-7214-4d23-a76a-42122bb9aa8'}}
19:23:25.854 DEBUG EventProcessor.java           (169) Marking event received for: ResourceID{name='mydb1', namespace='mysqlschemaoperatore2e-test-06099adb-7214-4d23-a76a-42122bb9aa8'}
19:23:25.855 DEBUG EventProcessor.java           (149) Skipping executing controller for resource id: ResourceID{name='mydb1', namespace='mysqlschemaoperatore2e-test-06099adb-7214-4d23-a76a-42122bb9aa8'}. Controller in execution: true. Latest Resource present: true
19:23:25.873 DEBUG Reflector.java                (139) Event received MODIFIED MySQLSchema resourceVersion 2111942
19:23:25.874 DEBUG ControllerResourceEventSource.java(60) Event received for resource: mydb1
19:23:25.875 DEBUG ControllerResourceEventSource.java(67) Skipping event handling resource 5bcee73a-800a-4528-b6da-0e93f6bed285 with version: 2111942
19:23:25.874 ERROR ReconciliationDispatcher.java (51) Error during event processing ExecutionScope{ resource id: ResourceID{name='mydb1', namespace='mysqlschemaoperatore2e-test-06099adb-7214-4d23-a76a-42122bb9aa8'}, version: 2111940} failed.
io.javaoperatorsdk.operator.AggregatedOperatorException: One or more DependentResource(s) failed:
        - Error reconciling dependent 'schema': No value present at: io.javaoperatorsdk.operator.sample.dependent.SchemaDependentResource.create(SchemaDependentResource.java:56)
    at io.javaoperatorsdk.operator.processing.Controller$2.execute(Controller.java:206) ~[classes/:?]
    at io.javaoperatorsdk.operator.processing.Controller$2.execute(Controller.java:1) ~[classes/:?]
    at io.javaoperatorsdk.operator.api.monitoring.Metrics.timeControllerExecution(Metrics.java:34) ~[classes/:?]
    at io.javaoperatorsdk.operator.processing.Controller.reconcile(Controller.java:165) ~[classes/:?]
    at io.javaoperatorsdk.operator.processing.event.ReconciliationDispatcher.reconcileExecution(ReconciliationDispatcher.java:133) ~[classes/:?]
    at io.javaoperatorsdk.operator.processing.event.ReconciliationDispatcher.handleReconcile(ReconciliationDispatcher.java:101) ~[classes/:?]
    at io.javaoperatorsdk.operator.processing.event.ReconciliationDispatcher.handleDispatch(ReconciliationDispatcher.java:74) ~[classes/:?]
    at io.javaoperatorsdk.operator.processing.event.ReconciliationDispatcher.handleExecution(ReconciliationDispatcher.java:49) ~[classes/:?]
    at io.javaoperatorsdk.operator.processing.event.EventProcessor$ControllerExecution.run(EventProcessor.java:353) ~[classes/:?]
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128) ~[?:?]
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628) ~[?:?]
    at java.lang.Thread.run(Thread.java:829) ~[?:?]
19:23:25.881 DEBUG EventProcessor.java           (186) Event processing finished. Scope: ExecutionScope{ resource id: ResourceID{name='mydb1', namespace='mysqlschemaoperatore2e-test-06099adb-7214-4d23-a76a-42122bb9aa8'}, version: 2111940}, PostExecutionControl: PostExecutionControl{onlyFinalizerHandled=false, updatedCustomResource=null, runtimeException=io.javaoperatorsdk.operator.AggregatedOperatorException: One or more DependentResource(s) failed:
        - Error reconciling dependent 'schema': No value present at: io.javaoperatorsdk.operator.sample.dependent.SchemaDependentResource.create(SchemaDependentResource.java:56)}
19:23:25.882 DEBUG EventProcessor.java           (252) New events exists for for resource id: ResourceID{name='mydb1', namespace='mysqlschemaoperatore2e-test-06099adb-7214-4d23-a76a-42122bb9aa8'}
19:23:25.884 DEBUG EventProcessor.java           (146) Executing events for custom resource. Scope: ExecutionScope{ resource id: ResourceID{name='mydb1', namespace='mysqlschemaoperatore2e-test-06099adb-7214-4d23-a76a-42122bb9aa8'}, version: 2111942}
19:23:25.885 DEBUG ReconciliationDispatcher.java (59) Handling dispatch for resource mydb1
19:23:25.886 DEBUG ReconciliationDispatcher.java (127) Reconciling resource mydb1 with version: 2111942 with execution scope: ExecutionScope{ resource id: ResourceID{name='mydb1', namespace='mysqlschemaoperatore2e-test-06099adb-7214-4d23-a76a-42122bb9aa8'}, version: 2111942}
19:23:25.889 DEBUG SchemaDependentResource.java  (71) Connecting to 'jdbc:mysql://127.0.0.1:3307' with user 'root'
19:23:25.923 DEBUG PortForwarderWebsocket.java   (187) FWD: onOpen
19:23:25.938 INFO  AbstractDependentResource.java(67) Creating Schema for primary ResourceID{name='mydb1', namespace='mysqlschemaoperatore2e-test-06099adb-7214-4d23-a76a-42122bb9aa8'}
19:23:25.939 DEBUG AbstractDependentResource.java(68) Creating dependent io.javaoperatorsdk.operator.sample.schema.Schema@c10f35cb for primary CustomResource{kind='MySQLSchema', apiVersion='mysql.sample.javaoperatorsdk/v1', metadata=ObjectMeta(annotations=null, clusterName=null, creationTimestamp=2022-04-01T17:23:25Z, deletionGracePeriodSeconds=null, deletionTimestamp=null, finalizers=[mysqlschemas.mysql.sample.javaoperatorsdk/finalizer], generateName=null, generation=1, labels=null, managedFields=[ManagedFieldsEntry(apiVersion=mysql.sample.javaoperatorsdk/v1, fieldsType=FieldsV1, fieldsV1=FieldsV1(additionalProperties={f:metadata={f:finalizers={.={}, v:"mysqlschemas.mysql.sample.javaoperatorsdk/finalizer"={}}}, f:spec={.={}, f:encoding={}}}), manager=okhttp, operation=Update, subresource=null, time=2022-04-01T17:23:25Z, additionalProperties={}), ManagedFieldsEntry(apiVersion=mysql.sample.javaoperatorsdk/v1, fieldsType=FieldsV1, fieldsV1=FieldsV1(additionalProperties={f:status={.={}, f:status={}}}), manager=okhttp, operation=Update, subresource=status, time=2022-04-01T17:23:25Z, additionalProperties={})], name=mydb1, namespace=mysqlschemaoperatore2e-test-06099adb-7214-4d23-a76a-42122bb9aa8, ownerReferences=[], resourceVersion=2111942, selfLink=null, uid=5bcee73a-800a-4528-b6da-0e93f6bed285, additionalProperties={}), spec=io.javaoperatorsdk.operator.sample.SchemaSpec@574f3fef, status=io.javaoperatorsdk.operator.sample.SchemaStatus@a38f47a}
19:23:25.939 DEBUG SchemaDependentResource.java  (71) Connecting to 'jdbc:mysql://127.0.0.1:3307' with user 'root'
19:23:25.971 DEBUG PortForwarderWebsocket.java   (187) FWD: onOpen
19:23:25.999 INFO  Controller.java               (197) Reconciled dependent 'schema' -> CREATED
19:23:26.000 DEBUG AbstractDependentResource.java(51) Update skipped for dependent Secret(apiVersion=v1, data={mysql.secret.user.name=bXlkYjEtdXNlcg==, mysql.secret.user.password=OE1pdFQ4QjJwMldjM3Y3WQ==}, immutable=null, kind=Secret, metadata=ObjectMeta(annotations=null, clusterName=null, creationTimestamp=2022-04-01T17:23:25Z, deletionGracePeriodSeconds=null, deletionTimestamp=null, finalizers=[], generateName=null, generation=null, labels=null, managedFields=[ManagedFieldsEntry(apiVersion=v1, fieldsType=FieldsV1, fieldsV1=FieldsV1(additionalProperties={f:data={.={}, f:mysql.secret.user.name={}, f:mysql.secret.user.password={}}, f:metadata={f:ownerReferences={.={}, k:{"uid":"5bcee73a-800a-4528-b6da-0e93f6bed285"}={}}}, f:type={}}), manager=okhttp, operation=Update, subresource=null, time=2022-04-01T17:23:25Z, additionalProperties={})], name=mydb1-secret, namespace=mysqlschemaoperatore2e-test-06099adb-7214-4d23-a76a-42122bb9aa8, ownerReferences=[OwnerReference(apiVersion=mysql.sample.javaoperatorsdk/v1, blockOwnerDeletion=null, controller=null, kind=MySQLSchema, name=mydb1, uid=5bcee73a-800a-4528-b6da-0e93f6bed285, additionalProperties={})], resourceVersion=2111941, selfLink=null, uid=6b7f081e-5e8f-43ec-b402-901dfbcc8578, additionalProperties={}), stringData=null, type=Opaque, additionalProperties={}) as it matched the existing one
19:23:26.000 INFO  Controller.java               (197) Reconciled dependent 'io.javaoperatorsdk.operator.sample.dependent.SecretDependentResource' -> NONE
19:23:26.001 INFO  MySQLSchemaReconciler.java    (44) Schema mydb1 created - updating CR status
19:23:26.013 DEBUG Reflector.java                (139) Event received MODIFIED MySQLSchema resourceVersion 2111943
19:23:26.013 DEBUG ControllerResourceEventSource.java(60) Event received for resource: mydb1
19:23:26.013 DEBUG ControllerResourceEventSource.java(67) Skipping event handling resource 5bcee73a-800a-4528-b6da-0e93f6bed285 with version: 2111943
19:23:26.015 DEBUG EventProcessor.java           (186) Event processing finished. Scope: ExecutionScope{ resource id: ResourceID{name='mydb1', namespace='mysqlschemaoperatore2e-test-06099adb-7214-4d23-a76a-42122bb9aa8'}, version: 2111942}, PostExecutionControl: PostExecutionControl{onlyFinalizerHandled=false, updatedCustomResource=CustomResource{kind='MySQLSchema', apiVersion='mysql.sample.javaoperatorsdk/v1', metadata=ObjectMeta(annotations=null, clusterName=null, creationTimestamp=2022-04-01T17:23:25Z, deletionGracePeriodSeconds=null, deletionTimestamp=null, finalizers=[mysqlschemas.mysql.sample.javaoperatorsdk/finalizer], generateName=null, generation=1, labels=null, managedFields=[ManagedFieldsEntry(apiVersion=mysql.sample.javaoperatorsdk/v1, fieldsType=FieldsV1, fieldsV1=FieldsV1(additionalProperties={f:metadata={f:finalizers={.={}, v:"mysqlschemas.mysql.sample.javaoperatorsdk/finalizer"={}}}, f:spec={.={}, f:encoding={}}}), manager=okhttp, operation=Update, subresource=null, time=2022-04-01T17:23:25Z, additionalProperties={}), ManagedFieldsEntry(apiVersion=mysql.sample.javaoperatorsdk/v1, fieldsType=FieldsV1, fieldsV1=FieldsV1(additionalProperties={f:status={.={}, f:observedGeneration={}, f:secretName={}, f:status={}, f:url={}, f:userName={}}}), manager=okhttp, operation=Update, subresource=status, time=2022-04-01T17:23:26Z, additionalProperties={})], name=mydb1, namespace=mysqlschemaoperatore2e-test-06099adb-7214-4d23-a76a-42122bb9aa8, ownerReferences=[], resourceVersion=2111943, selfLink=null, uid=5bcee73a-800a-4528-b6da-0e93f6bed285, additionalProperties={}), spec=io.javaoperatorsdk.operator.sample.SchemaSpec@1f6888aa, status=io.javaoperatorsdk.operator.sample.SchemaStatus@5bdb1d57}, runtimeException=null}
19:23:26.015 DEBUG EventProcessor.java           (271) Cleanup for successful execution for resource: mydb1
19:23:26.016 DEBUG EventProcessor.java           (228) ReScheduling event for resource: ResourceID{name='mydb1', namespace='mysqlschemaoperatore2e-test-06099adb-7214-4d23-a76a-42122bb9aa8'} with delay: 36000000
19:23:26.022 INFO  MySQLSchemaOperatorE2E.java   (137) Observed a ERROR status while awaiting: true
csviri commented 2 years ago

@scrocquesel thank you!!

We found a potentially related bug, will take a look on this withing that issue asap!

csviri commented 2 years ago

This is very strange since it cleary says:

19:23:25.853 DEBUG TemporaryResourceCache.java   (58) Won't put resource into cache found already informer cache: ResourceID{name='mydb1-secret', namespace='mysqlschemaoperatore2e-test-06099adb-7214-4d23-a76a-42122bb9aa8'}

What means when Secret is created and it wants to put it into the temp cache, it does not since informer already knows about it. See:

https://github.com/java-operator-sdk/java-operator-sdk/blob/51f0d4ea41d4f62f3ff52c76980f8ffc8ec544af/operator-framework-core/src/main/java/io/javaoperatorsdk/operator/processing/event/source/informer/TemporaryResourceCache.java#L52-L61

One thing came to my mind is that the secret could be deleted? Probably not.

But then on retry is seems to be succeeding:

19:23:25.999 INFO  Controller.java               (197) Reconciled dependent 'schema' -> CREATED

Nevertheless we should retest this after: https://github.com/java-operator-sdk/java-operator-sdk/pull/1129 Since this fixes a related issue.

csviri commented 2 years ago

@scrocquesel we merged the mentioned issue, could you pls retest this?

scrocquesel commented 2 years ago

@csviri It's seems the error is still in JOSDK 3.0.0.RC3 but the log changed. here with the quarkus flavor:

2022-04-29 20:48:01,426 INFO  [io.jav.ope.pro.dep.AbstractDependentResource] (EventHandler-mysqlschemareconciler) Creating 'mydb1-secret' Secret for primary ResourceID{name='mydb1', namespace='test'}
2022-04-29 20:48:01,426 DEBUG [io.jav.ope.pro.dep.AbstractDependentResource] (EventHandler-mysqlschemareconciler) Creating dependent Secret(apiVersion=v1, data={mysql.secret.user.name=bXlkYjEtdXNlcg==, mysql.secret.user.password=VTlXdUxhTU5NdlBoQ3FYSQ==}, immutable=null, kind=Secret, metadata=ObjectMeta(annotations=null, clusterName=null, creationTimestamp=null, deletionGracePeriodSeconds=null, deletionTimestamp=null, finalizers=[], generateName=null, generation=null, labels=null, managedFields=[], name=mydb1-secret, namespace=test, ownerReferences=[], resourceVersion=null, selfLink=null, uid=null, additionalProperties={}), stringData=null, type=null, additionalProperties={}) for primary CustomResource{kind='MySQLSchema', apiVersion='mysql.sample.javaoperatorsdk/v1', metadata=ObjectMeta(annotations=null, clusterName=null, creationTimestamp=2022-04-29T18:48:01.167422Z, deletionGracePeriodSeconds=null, deletionTimestamp=null, finalizers=[mysqlschemas.mysql.sample.javaoperatorsdk/finalizer], generateName=null, generation=1, labels=null, managedFields=[], name=mydb1, namespace=test, ownerReferences=[], resourceVersion=3, selfLink=null, uid=e16447f1-e64a-4494-9b45-b4dea0f8dd73, additionalProperties={}), spec=io.quarkiverse.operatorsdk.samples.mysqlschema.SchemaSpec@15639402, status=null}
2022-04-29 20:48:01,427 DEBUG [io.jav.ope.pro.eve.sou.inf.InformerEventSource] (EventHandler-mysqlschemareconciler) Starting event recording for: ResourceID{name='mydb1-secret', namespace='test'}
2022-04-29 20:48:01,427 DEBUG [io.jav.ope.pro.dep.kub.KubernetesDependentResource] (EventHandler-mysqlschemareconciler) Creating target resource with type: class io.fabric8.kubernetes.api.model.Secret, with id: ResourceID{name='mydb1-secret', namespace='test'}
2022-04-29 20:48:01,436 DEBUG [io.fab.kub.cli.ser.moc.KubernetesAttributesExtractor] (MockWebServer /127.0.0.1:57212) fromPath /api/v1/namespaces/test/secrets : {attributes: {namespace={key:namespace, value:test}, version={key:version, value:v1}, plural={key:plural, value:secrets}}}
Apr 29, 2022 8:48:01 PM okhttp3.mockwebserver.MockWebServer$3 processOneRequest
INFO: MockWebServer[37761] received request: POST /api/v1/namespaces/test/secrets HTTP/1.1 and responded: HTTP/1.1 200 OK
2022-04-29 20:48:01,442 DEBUG [io.fab.kub.cli.inf.cac.Reflector] (OkHttp http://localhost:37761/...) Event received ADDED Secret resourceVersion 4
2022-04-29 20:48:01,442 DEBUG [io.jav.ope.pro.eve.sou.inf.InformerEventSource] (EventHandler-mysqlschemareconciler) Did not found event in buffer with target version and resource id: ResourceID{name='mydb1-secret', namespace='test'}
2022-04-29 20:48:01,442 DEBUG [io.jav.ope.pro.eve.sou.inf.InformerEventSource] (OkHttp http://localhost:37761/...) On add event received for resource id: ResourceID{name='mydb1-secret', namespace='test'}
2022-04-29 20:48:01,445 INFO  [io.jav.ope.pro.Controller] (EventHandler-mysqlschemareconciler) Reconciled dependent 'io.quarkiverse.operatorsdk.samples.mysqlschema.dependent.SecretDependentResource' -> CREATED
2022-04-29 20:48:01,446 DEBUG [io.jav.ope.pro.eve.sou.inf.InformerEventSource] (OkHttp http://localhost:37761/...) Resource found in temporal cache for id: ResourceID{name='mydb1-secret', namespace='test'} resource versions equal: true
2022-04-29 20:48:01,446 DEBUG [io.jav.ope.pro.eve.sou.inf.InformerEventSource] (OkHttp http://localhost:37761/...) Skipping event propagation for add, since was a result of a reconcile action. Resource ID: ResourceID{name='mydb1-secret', namespace='test'}
2022-04-29 20:48:01,455 DEBUG [io.fab.kub.cli.ser.moc.KubernetesAttributesExtractor] (MockWebServer /127.0.0.1:57212) fromPath /apis/mysql.sample.javaoperatorsdk/v1/namespaces/test/mysqlschemas/mydb1/status : {attributes: {api={key:api, value:mysql.sample.javaoperatorsdk}, namespace={key:namespace, value:test}, name={key:name, value:mydb1}, version={key:version, value:v1}, plural={key:plural, value:mysqlschemas}}}
Apr 29, 2022 8:48:01 PM okhttp3.mockwebserver.MockWebServer$3 processOneRequest
INFO: MockWebServer[37761] received request: PUT /apis/mysql.sample.javaoperatorsdk/v1/namespaces/test/mysqlschemas/mydb1/status HTTP/1.1 and responded: HTTP/1.1 200 OK
2022-04-29 20:48:01,464 DEBUG [io.fab.kub.cli.inf.cac.Reflector] (OkHttp http://localhost:37761/...) Event received MODIFIED MySQLSchema resourceVersion 3
2022-04-29 20:48:01,464 DEBUG [io.jav.ope.pro.eve.sou.con.ControllerResourceEventSource] (OkHttp http://localhost:37761/...) Event received for resource: mydb1
2022-04-29 20:48:01,465 DEBUG [io.jav.ope.pro.eve.sou.con.ControllerResourceEventSource] (OkHttp http://localhost:37761/...) Skipping event handling resource e16447f1-e64a-4494-9b45-b4dea0f8dd73 with version: 3
2022-04-29 20:48:01,463 ERROR [io.jav.ope.pro.eve.ReconciliationDispatcher] (EventHandler-mysqlschemareconciler) Error during event processing ExecutionScope{ resource id: ResourceID{name='mydb1', namespace='test'}, version: 3} failed.: io.javaoperatorsdk.operator.AggregatedOperatorException: One or more DependentResource(s) failed:
        - Error reconciling dependent 'schema': No value present at: io.quarkiverse.operatorsdk.samples.mysqlschema.dependent.SchemaDependentResource.create(SchemaDependentResource.java:59)
    at io.javaoperatorsdk.operator.processing.Controller$2.execute(Controller.java:209)
Controller.java:209
    at io.javaoperatorsdk.operator.processing.Controller$2.execute(Controller.java:165)
Controller.java:165
    at io.javaoperatorsdk.operator.api.monitoring.Metrics.timeControllerExecution(Metrics.java:34)
Metrics.java:34
    at io.javaoperatorsdk.operator.processing.Controller.reconcile(Controller.java:164)
Controller.java:164
    at io.javaoperatorsdk.operator.processing.event.ReconciliationDispatcher.reconcileExecution(ReconciliationDispatcher.java:133)
ReconciliationDispatcher.java:133
    at io.javaoperatorsdk.operator.processing.event.ReconciliationDispatcher.handleReconcile(ReconciliationDispatcher.java:101)
ReconciliationDispatcher.java:101
    at io.javaoperatorsdk.operator.processing.event.ReconciliationDispatcher.handleDispatch(ReconciliationDispatcher.java:74)
ReconciliationDispatcher.java:74
    at io.javaoperatorsdk.operator.processing.event.ReconciliationDispatcher.handleExecution(ReconciliationDispatcher.java:49)
ReconciliationDispatcher.java:49
    at io.javaoperatorsdk.operator.processing.event.EventProcessor$ControllerExecution.run(EventProcessor.java:353)
EventProcessor.java:353
    at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
ThreadPoolExecutor.java:1128
    at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
ThreadPoolExecutor.java:628
    at java.base/java.lang.Thread.run(Thread.java:829)

And what should interest you is that later on, when retrying, it actually tries to get it from the informer.

2022-04-29 20:48:03,507 DEBUG [io.jav.ope.pro.eve.sou.inf.ManagedInformerEventSource] (EventHandler-mysqlschemareconciler) Resource not found in temporal cache reading it from informer cache, for Resource ID: ResourceID{name='mydb1-secret', namespace='test'}
2022-04-29 20:48:03,549 INFO  [io.jav.ope.pro.Controller] (EventHandler-mysqlschemareconciler) Reconciled dependent 'schema' -> CREATED

I was able to remove the orElseThrow in the reconciler (https://github.com/quarkiverse/quarkus-operator-sdk/pull/331) but I don't remember if it tested without it.

csviri commented 2 years ago

thx @scrocquesel we will take a look later, there is now a small update regarding the mysql sample, we will test that also. This might be caused by this line (what is bad implementation): https://github.com/quarkiverse/quarkus-operator-sdk/blob/b735399b8170d1edcbb08106fd5796f25d372cfb/samples/mysql-schema/src/main/java/io/quarkiverse/operatorsdk/samples/mysqlschema/dependent/SchemaDependentResource.java#L65

csviri commented 2 years ago

@metacosm @scrocquesel could we retest this with RC4 ? now the issue from my previous comment is addressed there.

scrocquesel commented 2 years ago

Still the same log result. While debugging I observe that the primaryToSecondaryIndex is empty when SchemaDependentResource.create tries to retrieve the secret.

Could there be a race condition between the Informer thread (OkHttp http://localhost:37761/...) that call InformerEventSource onAdd and the reconciler thread (EventHandler-mysqlschemareconciler) calling InformerEventSource.getSecondaryResource ?

csviri commented 2 years ago

That is indeed possible, will take a look on it ASAP.

csviri commented 2 years ago

@scrocquesel I created an integration test for this, try to reproduce it: https://github.com/java-operator-sdk/java-operator-sdk/pull/1203/files That runs a similar workflow now 300 times for each run. So for a PR it runs it more then ~2000 times. It never failed. It does not prove anything just trying to reproduce this.

One thing I noticed that you use MockWebServer, can you reproduce this with minikube also?

metacosm commented 2 years ago

@csviri You can see the problem in action at https://github.com/quarkiverse/quarkus-operator-sdk/runs/6357781975?check_suite_focus=true#step:5:2781, I haven't looked in details yet but basically the reconciler doesn't really like when users are not throwing exceptions when the dependent is not ready :)

csviri commented 2 years ago

@csviri You can see the problem in action at https://github.com/quarkiverse/quarkus-operator-sdk/runs/6357781975?check_suite_focus=true#step:5:2781, I haven't looked in details yet but basically the reconciler doesn't really like when users are not throwing exceptions when the dependent is not ready :)

will check thx, but I was not able to reproduce this with minikube + core framework. So suspecting this is an issue with mockWebServer maybe. But will reproduce/debug it with quarkus + mockWebserver.

csviri commented 2 years ago

Checked it, and after some time of debugging, it turns out the resources are reconciled in wrong order (the problem is in quarkus extension), just not trivial to get it from logs. @metacosm will prepare the fix. thank you @metacosm @scrocquesel to get to the bottom of this!

metacosm commented 2 years ago

The fix is in in the Quarkus extension (with a quite neat short commit hash!): https://github.com/quarkiverse/quarkus-operator-sdk/pull/338/commits/cccccec 😁

scrocquesel commented 2 years ago

Cool, will add the assertion to the test soon. Thanks for debunking this.