kafka-ops / julie

A solution to help you build automation and gitops in your Apache Kafka deployments. The Kafka gitops!
MIT License
418 stars 113 forks source link

'Translation of principal failed' error with Confluent Cloud and v4.2.0 #486

Open maxschorn opened 2 years ago

maxschorn commented 2 years ago

We use the Julie Docker image v4.2.0 with Confluent Cloud. I also already saw the discussions in #456 and #438 but unfortunately do not see where the problem lies on our side.

Describe the bug We get the below error when executing Julie:

java.io.IOException: Translation of principal User:sa-xxxx failed, please review your system configuration

    at com.purbon.kafka.topology.utils.CCloudUtils.translateIfNecessary(CCloudUtils.java:44) ~[julie-ops.jar:?]
    at com.purbon.kafka.topology.roles.CCloudAclsProvider.clearBindings(CCloudAclsProvider.java:50) ~[julie-ops.jar:?]
    at com.purbon.kafka.topology.actions.access.ClearBindings.execute(ClearBindings.java:29) ~[julie-ops.jar:?]
    at com.purbon.kafka.topology.actions.BaseAccessControlAction.run(BaseAccessControlAction.java:33) ~[julie-ops.jar:?]
    at com.purbon.kafka.topology.ExecutionPlan.execute(ExecutionPlan.java:124) ~[julie-ops.jar:?]
    at com.purbon.kafka.topology.ExecutionPlan.run(ExecutionPlan.java:101) [julie-ops.jar:?]
    at com.purbon.kafka.topology.JulieOps.run(JulieOps.java:210) [julie-ops.jar:?]
    at com.purbon.kafka.topology.JulieOps.run(JulieOps.java:225) [julie-ops.jar:?]
    at com.purbon.kafka.topology.CommandLineInterface.processTopology(CommandLineInterface.java:212) [julie-ops.jar:?]
    at com.purbon.kafka.topology.CommandLineInterface.run(CommandLineInterface.java:161) [julie-ops.jar:?]
    at com.purbon.kafka.topology.CommandLineInterface.main(CommandLineInterface.java:147) [julie-ops.jar:?]

We also tried with principal User:service-account-name. We checked that the service account exists in Confluent Cloud at that point.

To Reproduce Steps to reproduce the behaviour:

  1. provide the name or id of a service account as the principal in the consumers:

    context: kafka
    projects:
    - name: PROJECT_NAME
    topics:
      - name: topic.name
        config:
          num.partitions: '1'
          replication.factor: '3'
        consumers:
          - principal: User:sa-xxxx
            group: service-account-name*

    Behaviour was the same when we used service-account-name instead of service-account-id sa-xxxx

  2. following Julie properties:

security.protocol=SASL_SSL
ssl.endpoint.identification.algorithm=https
sasl.mechanism=PLAIN
sasl.jaas.config=org.apache.kafka.common.security.plain.PlainLoginModule required \
  username="<CLUSTER_API_KEY>" \
  password="<CLUSTER_API_SECRET>";
ccloud.environment=<CLOUD_ENVIRONMENT>
ccloud.cluster.api.key=<CLUSTER_API_KEY>
ccloud.cluster.api.secret=<CLUSTER_API_SECRET>
ccloud.cloud.api.key=<CLOUD_API_KEY>
ccloud.cloud.api.secret=<CLOUD_API_SECRET>
topology.builder.ccloud.kafka.cluster.id=<CLUSTER_ID>
ccloud.cluster.url=<CLUSTER_REST_URL>
topology.builder.access.control.class = com.purbon.kafka.topology.roles.CCloudAclsProvider

Environment vars:

    ALLOW_DELETE_BINDINGS: "true"
    ALLOW_DELETE_PRINCIPALS: "true"
    ALLOW_DELETE_TOPICS: "true"
  1. Run Julie from docker image

Expected behavior

Julie should run successfully, translate the service account and create/delete an ACL.

Runtime (please complete the following information):

Additional context In our setup we generate a topology with data from Confluent CLI. There we get either the id of a service account (sa-xxxx) or the name of it (service-account-name). We tried passing the following as principals in the topology:

When running it with ccloud.service_account.translation.enabled=true, we get the above error. When setting ccloud.service_account.translation.enabled=false, no ACL gets created (probably due to the Confluent API still needing the integer id).

When we were still using ccloud CLI we got the account-id of a service account, which was an integer and put that into our topology. Julie ran successfully with that.

Is this a bug with the translation or did we configure something wrong?

Cheers