Error waiting for topic (on second deploy)

mattfysh commented 1 month ago

What happened?

With the latest version, the first deploy of a topic succeeds but subsequent deploys fail with the error, even if the resource has not changed (I use the --refresh flag with pulumi up)

Error waiting for topic (TOPIC_NAME) to become ready: couldn't find resource (21 retries)

Example

  new kafka.Topic(
     'my-topic',
      {
        name,
        partitions: 1,
        replicationFactor: 1,
        config: {
          'cleanup.policy': 'delete',
          'retention.ms': 604800000,
          'retention.bytes': 1073741824,
          'max.message.bytes': 1048576,
        },
      },
      { provider }
    )

Output of `pulumi about`

CLI
Version 3.115.2 Go Version go1.22.2 Go Compiler gc

Host
OS darwin Version 14.4.1 Arch arm64

Additional context

I use redpanda so I put this in my provider config: kafkaVersion: '2.1.0'

There is also a very similar bug report on the Mongey repo from 2018: https://github.com/Mongey/terraform-provider-kafka/issues/35

Contributing

Vote on this issue by adding a 👍 reaction. To contribute a fix for this issue, leave a comment (and link to your pull request, if you've opened one already).

VenelinMartinov commented 1 month ago

Hi @mattfysh, I have not managed to reproduce the issue.

Are you certain you've set up access to the kafka instance correctly?

If yes, could you please add a full pulumi program which reproduces this problem along with any necessary config? We can't really do much about the problem if we are not able to reproduce it.

mattfysh commented 1 month ago

Hi @VenelinMartinov thank you for the reply, I've created a repro here: https://github.com/mattfysh/mongey-redpanda

Please let me know if any issue reproducing the issue, thanks!

VenelinMartinov commented 1 month ago

HI @mattfysh, I have not been able to repro with the provided code. I added an EKS sever to make the whole program self-sufficient but I keep getting:

error: 1 error occurred:
        * Helm release "mongey-redpanda/redpanda-a60cac14" failed to initialize completely. Use Helm CLI to investigate.: failed to become available within allocated timeout. Error: Helm Release mongey-redpanda/redpanda-a60cac14: client rate limiter Wait returned an error: context deadline exceeded

I noticed that in your code you are using an Output in the kafka provider config, which is a known pain point in the engine - it might be causing the issue. I tried to repro with

import * as kafka from '@pulumi/kafka'
import * as pulumi from '@pulumi/pulumi'
import * as command from '@pulumi/command'

const hostCmd = new command.local.Command("host", {args: ["echo", "localhost"]})
const portCmd = new command.local.Command("port", {args: ["echo", "9093"]})
const port1Cmd = new command.local.Command("port1", {args: ["echo", "9092"]})

export const brokers = pulumi.interpolate`${hostCmd.stdout}:${portCmd.stdout},localhost:${port1Cmd.stdout}`

const provider = new kafka.Provider('kafka-provider', {
  bootstrapServers: brokers.apply(x => x.split(',')),
  tlsEnabled: false,
  kafkaVersion: '2.1.0',
})

new kafka.Topic(
  'test-topic',
  {
    name: 'test.topic2',
    partitions: 1,
    replicationFactor: 1,
    config: {
      'cleanup.policy': 'delete',
      'retention.ms': 604800000,
      'retention.bytes': 1073741824,
      'max.message.bytes': 1048576,
    },
  },
  { provider }
)

But got a different error.

In any case, could you attempt to put the whole kafka code inside an apply?

brokers.apply(brok => {
  const provider = new kafka.Provider('kafka-provider', {
    bootstrapServers: brok.split(','),
    tlsEnabled: false,
    saslMechanism: 'scram-sha512',
    saslUsername: user,
    saslPassword: pass,
    kafkaVersion: '2.1.0',
  })

  new kafka.Topic(
    'test-topic',
    {
      name: 'test.topic',
      partitions: 1,
      replicationFactor: 1,
      config: {
        'cleanup.policy': 'delete',
        'retention.ms': 604800000,
        'retention.bytes': 1073741824,
        'max.message.bytes': 1048576,
      },
    },
    { provider }
  )
})

Let me know if that fixes the issue.

mattfysh commented 1 month ago

hey there, thanks for taking a look! if you're wanting to deploy the redpanda helm chart on EKS you may need to tweak the chart values. I think the local storage.hostPath may cause issues with a remote cluster, as well as the usage of 'localhost'

I will try tomorrow to see if using static values for provider config fixes things, thanks again

mattfysh commented 1 month ago

I tried with the static value for provider config and still kept getting the error, but I think I may have located what might be causing the problem, it seems to be that the --refresh operation is getting values back from the redpanda API that don't match the pulumi values, specifically under the topic.config map.

I've included the logs of the refresh operation from Pulumi Cloud below. You can see some churn there regarding the config map. Using the diff values as a guide, I changed the config map from this:

config: {
  'cleanup.policy': 'delete',
  'retention.ms': 604800000,
  'retention.bytes': 1073741824,
  'max.message.bytes': 1048576,
}

to this (removed cleanup.policy and changed other values to strings instead of numbers):

config: {
  'retention.ms': '604800000',
  'retention.bytes': '1073741824',
  'max.message.bytes': '1048576',
}

After making that change I no longer see the error, but this config map used to work so I'm unsure what has changed recently to prevent the old config from working - and I would also like to be able to set a cleanup.policy value but it seems I cannot do so right now without getting the error on refresh.

1. Summary

     Type                              Name                 Status                  Info
     pulumi:pulumi:Stack               mongey-redpanda-dev  **failed**              1 error; 2 messages
 ~   ├─ kafka:index:Topic              test-topic           **updating failed**     [diff: ~config]; 1 error
     ├─ kubernetes:core/v1:Secret      sasl                                         
     ├─ kubernetes:helm.sh/v3:Release  redpanda                                     
     ├─ kubernetes:core/v1:Namespace   redpanda-ns                                  
     └─ pulumi:providers:kafka         kafka-provider                               

Diagnostics:
  pulumi:pulumi:Stack (mongey-redpanda-dev):
    (node:27741) [DEP0040] DeprecationWarning: The `punycode` module is deprecated. Please use a userland alternative instead.
    (Use `node --trace-deprecation ...` to show where the warning was created)
    error: update failed

  kafka:index:Topic (test-topic):
    error: 1 error occurred:
        * updating urn:pulumi:dev::mongey-redpanda::kafka:index/topic:Topic::test-topic: 1 error occurred:
        * Error waiting for topic (test.topic) to become ready: couldn't find resource (21 retries)

Resources:
    5 unchanged

2. Diff

  pulumi:pulumi:Stack: (same)
    [urn=urn:pulumi:dev::mongey-redpanda::pulumi:pulumi:Stack::mongey-redpanda-dev]
(node:27741) [DEP0040] DeprecationWarning: The `punycode` module is deprecated. Please use a userland alternative instead.

(Use `node --trace-deprecation ...` to show where the warning was created)

    ~ kafka:index/topic:Topic: (update)
        [id=test.topic]
        [urn=urn:pulumi:dev::mongey-redpanda::kafka:index/topic:Topic::test-topic]
        [provider=urn:pulumi:dev::mongey-redpanda::pulumi:providers:kafka::kafka-provider::2f7e5eb8-5aaa-4236-b4d3-7adc25754386]
      ~ config: {
          + cleanup.policy   : "delete"
          ~ max.message.bytes: "1048576" => 1048576
          ~ retention.bytes  : "1073741824" => 1073741824
          ~ retention.ms     : "604800000" => 604800000
        }
error: 1 error occurred:
    * updating urn:pulumi:dev::mongey-redpanda::kafka:index/topic:Topic::test-topic: 1 error occurred:
    * Error waiting for topic (test.topic) to become ready: couldn't find resource (21 retries)

error: update failed

~ pulumi:pulumi:Stack: (refresh)
    [urn=urn:pulumi:dev::mongey-redpanda::pulumi:pulumi:Stack::mongey-redpanda-dev]
    ~ pulumi:providers:kafka: (refresh)
        [id=2f7e5eb8-5aaa-4236-b4d3-7adc25754386]
        [urn=urn:pulumi:dev::mongey-redpanda::pulumi:providers:kafka::kafka-provider]
    ~ kafka:index/topic:Topic: (refresh)
        [id=test.topic]
        [urn=urn:pulumi:dev::mongey-redpanda::kafka:index/topic:Topic::test-topic]
        [provider=urn:pulumi:dev::mongey-redpanda::pulumi:providers:kafka::kafka-provider::2f7e5eb8-5aaa-4236-b4d3-7adc25754386]
        --outputs:--
      ~ config           : {
          - cleanup.policy   : "delete"
            max.message.bytes: "1048576"
            retention.bytes  : "1073741824"
            retention.ms     : "604800000"
        }
    ~ kubernetes:core/v1:Secret: (refresh)
        [id=mongey-redpanda/sasl-96805bab]
        [urn=urn:pulumi:dev::mongey-redpanda::kubernetes:core/v1:Secret::sasl]
        [provider=urn:pulumi:dev::mongey-redpanda::pulumi:providers:kubernetes::default_4_11_0::5166f4db-b85b-42d1-8ab9-7c5a04543cb4]
    ~ kubernetes:core/v1:Namespace: (refresh)
        [id=mongey-redpanda]
        [urn=urn:pulumi:dev::mongey-redpanda::kubernetes:core/v1:Namespace::redpanda-ns]
        [provider=urn:pulumi:dev::mongey-redpanda::pulumi:providers:kubernetes::default_4_11_0::5166f4db-b85b-42d1-8ab9-7c5a04543cb4]
    ~ kubernetes:helm.sh/v3:Release: (refresh)
        [id=mongey-redpanda/redpanda-a54a62bb]
        [urn=urn:pulumi:dev::mongey-redpanda::kubernetes:helm.sh/v3:Release::redpanda]
        [provider=urn:pulumi:dev::mongey-redpanda::pulumi:providers:kubernetes::default_4_11_0::5166f4db-b85b-42d1-8ab9-7c5a04543cb4]

3. Diagnostic

Diagnostics:
  pulumi:pulumi:Stack
    (node:28056) [DEP0040] DeprecationWarning: The `punycode` module is deprecated. Please use a userland alternative instead.
    (Use `node --trace-deprecation ...` to show where the warning was created)

mattfysh commented 1 month ago

After looking into the redpanda repo, I think I may have found the root cause of this issue: https://github.com/redpanda-data/redpanda/pull/17456

I still don't understand the cause of the error though, if the redpanda API is returning no/wrong value for cleanup.policy, how does that cause the refresh operation to fail?

Are you happy to leave this issue open to investigate any potential measures to be added to either the pulumi or terraform providers? thanks!

pulumi / pulumi-kafka