Closed purbon closed 2 years ago
Hi, again! Thanks for including my patch in 4.1.3. However, this divergence detector that was also included, broke the backwards compatibility again. We run JulieOps with allow.delete.topics=false
, because we have an outside process for handling obsolete topics. With 4.1.3 JulieOps thus throws an exception because the state contains these topics, while the cluster does not. Due to the way we use JulieOps, this is as expected, and not an error.
How should we handle this? Can the divergency checker be made optional?
@purbon , as Sverre mentions this feature causes some issues.
allow.delete.topics=false
the normal case will be to remove topics from topology first and then remove them completely, new runs of JulieOps might easily be triggered in between in a multi tenant setup. So when allow.delete.topics=false
this feature should not throw an exception.So at least this feature should be controlled by a feature flag. Or maybe just log these as errors/warnings without terminating JulieOps?
Side note: As you state "Changing resources outside the scope of JulieOps it is not a good practice". However I think the way to handle this is to bootstrap ACLs correctly for the cluster. Only the JulieOps internal/admin user should be allowed to change topics/ACLs etc after cluster installation/setup.
Hello!
I'm wondering, if anybody else is having problems managing schema-registry permissions using JulieOps with RBAC provider after this change (i am using 4.2.0 version, confluent platform 7.0). Permissions are created on first run, but on next executions it fails with error:
com.purbon.kafka.topology.exceptions.RemoteValidationException: Your remote state has changed since the last execution, this ACL(s): 'Subject', 'test.ega.topic-value', '*', 'ResourceOwner', 'User:egarjans', 'LITERAL' are in your local state, but not in the cluster, please investigate! at com.purbon.kafka.topology.AccessControlManager.detectDivergencesInTheRemoteCluster(AccessControlManager.java:110) at com.purbon.kafka.topology.AccessControlManager.loadActualClusterStateIfAvailable(AccessControlManager.java:89) at com.purbon.kafka.topology.AccessControlManager.updatePlan(AccessControlManager.java:72) at com.purbon.kafka.topology.JulieOps.run(JulieOps.java:200) at com.purbon.kafka.topology.JulieOps.run(JulieOps.java:225) at com.purbon.kafka.topology.CommandLineInterface.processTopology(CommandLineInterface.java:212) at com.purbon.kafka.topology.CommandLineInterface.run(CommandLineInterface.java:161) at com.purbon.kafka.topology.CommandLineInterface.main(CommandLineInterface.java:147)
descriptor.yml fails looks like this:
context: test
# source: source
projects:
- name: ega
schemas:
- principal: "User:egarjans"
subjects:
- "test.ega.topic-value"
role: "ResourceOwner"
.cluster-state contain new acl:
{
"resourceType" : "Subject",
"resourceName" : "test.ega.topic-value",
"host" : "*",
"operation" : "ResourceOwner",
"principal" : "User:egarjans",
"pattern" : "LITERAL",
"scope" : {
"clusters" : {
"kafka-cluster" : "9OpGFe2SSQC9HiEFXSBCpw",
"schema-registry-cluster" : "schema-registry"
},
"resources" : [ {
"name" : "test.ega.topic-value",
"patternType" : "LITERAL",
"resourceType" : "Subject"
} ]
}
},
From confluent cli i can see that permissions exist on cluster, but validation still fails:
egarjans@LTPF2M88JH:~$ confluent iam rolebinding list --kafka-cluster-id 9OpGFe2SSQC9HiEFXSBCpw --schema-registry-cluster-id schema-registry --principal "User:adm.e.garjans" --role ResourceOwner
Principal | Role | ResourceType | Name | PatternType
+--------------------+---------------+--------------+----------------------+-------------+
User:egarjans | ResourceOwner | Subject | test.ega.topic-value | LITERAL
Please check if the PR fulfills these requirements
[x] The commit messages are descriptive
[x] Tests for the changes have been added (for bug fixes / features)
[ ] Docs have been added / updated (for bug fixes / features)
[x] An issue has been created for the pull requests. Some issues might require previous discussion.
What kind of change does this PR introduce? (Bug fix, feature, docs update, ...)
This PR introduced a new feature to detect changes between the local state and the remote cluster status. For this first version, JulieOps is going to raise an exception and 🤯 until the divergence is fixed either by updating the local state or fixing the remote cluster. Note, in future versions this behaviour will be expected with more granular support.
Managers supported:
Note: Changing resources outside the scope of JulieOps it is not a good practice, but this PR will help teams detect such cases in case they happen.