redpanda-data / redpanda

Redpanda is a streaming data platform for developers. Kafka API compatible. 10x faster. No ZooKeeper. No JVM!
https://redpanda.com
9.51k stars 580 forks source link

Kafka Streams GlobalKTables example hangs on 1st try #2780

Open NyaliaLui opened 2 years ago

NyaliaLui commented 2 years ago

In the Kafka Streams examples, there is this example that shows how to join data using a Global KTable. The example hangs on it's first run but subsequent runs succeed with the expected output.

See the following to reproduce

Requires: Follow the build steps for kafka-streams here -- you'll need Apache Maven and Java 8+ (I use Java 11)

My redpanda .yaml files: found here

  1. Run redpanda brokers: <path to build root>/release/clang/bin/redpanda --redpanda-cfg ~/local-cluster/single/single.yaml --smp=1

  2. Create topics:

    rpk topic create order -p 4 -r 1
    rpk topic create customer -p 3 -r 1
    rpk topic create product -p 2 -r 1
    rpk topic create enriched-order -p 4 -r 1
  3. From within the kafka-streams-examples dir Run GlobalKTables example (this will block until next step):

    java -cp target/kafka-streams-examples-6.2.0-standalone.jar io.confluent.examples.streams.GlobalKTablesExample
  4. Run their driver which generates load:

    java -cp target/kafka-streams-examples-6.2.0-standalone.jar io.confluent.examples.streams.GlobalKTablesAndStoresExampleDriver

Results

On the first run, no output is generated from the driver in Step 4. Re-do Steps 3 & 4 by using ctrl-c to terminate and you should see output similar to below.

{"product": {"name": "GR06a073mW", "description": "002iz", "supplier_name": "BUNvw4q1TRiZMpWtT5z5"}, "customer": {"name": "koaUMOvNBu", "gender": "male", "region": "B4ItOh0XEo1DRaC8Asfc"}, "order": {"customer_id": 3, "product_id": 2, "time_order_placed_at": 5514553331772846669}}
{"product": {"name": "GR06a073mW", "description": "002iz", "supplier_name": "BUNvw4q1TRiZMpWtT5z5"}, "customer": {"name": "koaUMOvNBu", "gender": "male", "region": "B4ItOh0XEo1DRaC8Asfc"}, "order": {"customer_id": 3, "product_id": 2, "time_order_placed_at": -180651686919023729}}
{"product": {"name": "NFNBDRyfLj", "description": "OYHjl", "supplier_name": "Zt5WNskd1cCARk7LxHai"}, "customer": {"name": "PNqA0x8GjR", "gender": "female", "region": "ISP1pVJT5L4wSD3yg9Mv"}, "order": {"customer_id": 2, "product_id": 3, "time_order_placed_at": -6350599911337657398}}

The example should work the first time without restarting.

JIRA Link: CORE-771

NyaliaLui commented 2 years ago

Related: #2567

dotnwat commented 2 years ago

seems like it might be a race. one experiment would be: arrange for the consumer group and transactions internal topics to be created through some other means before trying the streams example. i'm not sure what other races might exist, but this looks like an interesting issue you have found.