kafka-ops / julie

A solution to help you build automation and gitops in your Apache Kafka deployments. The Kafka gitops!
MIT License
418 stars 113 forks source link

Handling of pagination of service accounts in Confluent Cloud broken #530

Closed maxschorn closed 2 years ago

maxschorn commented 2 years ago

Hi, we saw an error when executing Julie in our environment and after some investigation we found a bug that causes it.

Describe the bug We get the below error when executing Julie:

com.fasterxml.jackson.databind.exc.UnrecognizedPropertyException: Unrecognized field "errors" (class com.purbon.kafka.topology.api.ccloud.response.ListServiceAccountResponse), not marked as ignorable (4 known properties: "data", "api_version", "kind", "metadata"])
 at [Source: (String)"***
  "errors": [
    ***
      "id": "xxx",
      "status": "400",
      "detail": "failed to base64 decode page cursor",
      "source": ***
    ***
  ]
***"; line: 2, column: 14] (through reference chain: com.purbon.kafka.topology.api.ccloud.response.ListServiceAccountResponse["errors"])
  at com.fasterxml.jackson.databind.exc.UnrecognizedPropertyException.from(UnrecognizedPropertyException.java:61)
  at com.fasterxml.jackson.databind.DeserializationContext.handleUnknownProperty(DeserializationContext.java:987)
  at com.fasterxml.jackson.databind.deser.std.StdDeserializer.handleUnknownProperty(StdDeserializer.java:1974)
  at com.fasterxml.jackson.databind.deser.BeanDeserializerBase.handleUnknownProperty(BeanDeserializerBase.java:1701)
  at com.fasterxml.jackson.databind.deser.BeanDeserializerBase.handleUnknownVanilla(BeanDeserializerBase.java:1679)
  at com.fasterxml.jackson.databind.deser.BeanDeserializer.vanillaDeserialize(BeanDeserializer.java:330)
  at com.fasterxml.jackson.databind.deser.BeanDeserializer.deserialize(BeanDeserializer.java:187)
  at com.fasterxml.jackson.databind.deser.DefaultDeserializationContext.readRootValue(DefaultDeserializationContext.java:322)
  at com.fasterxml.jackson.databind.ObjectMapper._readMapAndClose(ObjectMapper.java:4593)
  at com.fasterxml.jackson.databind.ObjectMapper.readValue(ObjectMapper.java:3548)
  at com.fasterxml.jackson.databind.ObjectMapper.readValue(ObjectMapper.java:3516)
  at com.purbon.kafka.topology.utils.JSON.toObject(JSON.java:53)
  at com.purbon.kafka.topology.api.ccloud.CCloudApi.getListServiceAccounts(CCloudApi.java:162)
  at com.purbon.kafka.topology.api.ccloud.CCloudApi.listServiceAccounts(CCloudApi.java:121)
  at com.purbon.kafka.topology.serviceAccounts.CCloudPrincipalProvider.listServiceAccounts(CCloudPrincipalProvider.java:27)
  at com.purbon.kafka.topology.AbstractPrincipalManager.printCurrentState(AbstractPrincipalManager.java:112)
  at com.purbon.kafka.topology.JulieOps.run(JulieOps.java:217)
  at com.purbon.kafka.topology.JulieOps.run(JulieOps.java:227)
  at com.purbon.kafka.topology.CommandLineInterface.processTopology(CommandLineInterface.java:212)
  at com.purbon.kafka.topology.CommandLineInterface.run(CommandLineInterface.java:161)
  at com.purbon.kafka.topology.CommandLineInterface.main(CommandLineInterface.java:147)

We turned on debug logs and saw the following calls being made by Julie:

...
[DEBUG] 2022-08-19 09:35:12.405 [main] JulieHttpClient - method: GET response: (GET https://api.confluent.cloud/iam/v2/service-accounts?page_size=100) 200
[DEBUG] 2022-08-19 09:35:12.421 [main] JulieHttpClient - method: GET request.uri: https://api.confluent.cloud/iam/v2/service-accounts?page_token=<page_token>?page_size=100
...

We found out, that we had more than 100 service accounts in our Confluent Cloud org, which leads to the response having a 'next' field in the 'metadata' block which Julie then tries to fetch again.

The constructed URL is malformed which leads to the above mentioned error. The URL is constructed with two '?' parameters instead of one '?' followed by '&'.

We can not increase it any further as 100 is the maximum page size of the API.

I guess this part is the cause of it: https://github.com/kafka-ops/julie/blob/1991611cdfadd41978cc8d27aad2e8e263fc2dd1/src/main/java/com/purbon/kafka/topology/api/ccloud/CCloudApi.java#L136

To Reproduce Steps to reproduce the behavior:

  1. Have a number of service accounts available in Confluent Cloud and set ccloud.service_account.query.page.size to an int which leads to multiple pages when listing service accounts
  2. provide a topology which creates or clears some bindings so that listServiceAccounts gets called
  3. Following additional config:
    security.protocol=SASL_SSL
    ssl.endpoint.identification.algorithm=https
    sasl.mechanism=PLAIN
    sasl.jaas.config=org.apache.kafka.common.security.plain.PlainLoginModule required \
    username="<CLUSTER_API_KEY>" \
    password="<CLUSTER_API_SECRET>";
    ccloud.environment=<CLOUD_ENVIRONMENT>
    ccloud.cluster.api.key=<CLUSTER_API_KEY>
    ccloud.cluster.api.secret=<CLUSTER_API_SECRET>
    ccloud.cloud.api.key=<CLOUD_API_KEY>
    ccloud.cloud.api.secret=<CLOUD_API_SECRET>
    topology.builder.ccloud.kafka.cluster.id=<CLUSTER_ID>
    ccloud.cluster.url=<CLUSTER_REST_URL>
    topology.builder.access.control.class = com.purbon.kafka.topology.roles.CCloudAclsProvider
    ccloud.service_account.translation.enabled=false
    julie.verify.remote.state=true
    julie.http.retry.times=20
    julie.http.retry.backoff.time.ms=30000
  4. Run Julie
  5. Julie will fail when it tries to list service accounts

Expected behavior Julie should correctly deal with a paginated response from Confluent Cloud REST API v2

Runtime (please complete the following information):

purbon commented 2 years ago

Thanks a lot for reporting this @maxschorn, I was able to reproduce the error and this is going to be fixed with the upcoming release.