pathwaycom / pathway

Python ETL framework for stream processing, real-time analytics, LLM pipelines, and RAG.
https://pathway.com
Other
2.84k stars 98 forks source link

[Bug]: rdkafka consumer queue not available #6

Closed stairclimber closed 5 months ago

stairclimber commented 5 months ago

Steps to reproduce

When I use kafka connector, program crashed and tell me rdkafka consumer queue not available.

Relevant log output

Traceback (most recent call last):
  File "/home/ubuntu/Desktop/syscall_ids/pp.py", line 34, in <module>
    pw.run()
  File "/home/ubuntu/.local/lib/python3.10/site-packages/pathway/internals/runtime_type_check.py", line 19, in with_type_validation
    return beartype.beartype(f)(*args, **kwargs)
  File "<@beartype(pathway.internals.run.run) at 0x7fc66df8fc70>", line 107, in run
  File "/home/ubuntu/.local/lib/python3.10/site-packages/pathway/internals/run.py", line 41, in run
    ).run_outputs()
  File "/home/ubuntu/.local/lib/python3.10/site-packages/pathway/internals/graph_runner/__init__.py", line 91, in run_outputs
    return self._run(context, after_build=after_build)
  File "/home/ubuntu/.local/lib/python3.10/site-packages/pathway/internals/graph_runner/__init__.py", line 139, in _run
    return api.run_with_new_graph(
  File "/home/ubuntu/.local/lib/python3.10/site-packages/pathway/internals/graph_runner/__init__.py", line 117, in logic
    storage_graph.build_scope(scope, state, self)
  File "/home/ubuntu/.local/lib/python3.10/site-packages/pathway/internals/graph_runner/storage_graph.py", line 326, in build_scope
    handler.run(operator, self.output_storages[operator])
  File "/home/ubuntu/.local/lib/python3.10/site-packages/pathway/internals/graph_runner/operator_handler.py", line 81, in run
    self._run(operator, output_storages)
  File "/home/ubuntu/.local/lib/python3.10/site-packages/pathway/internals/graph_runner/operator_handler.py", line 132, in _run
    materialized_table = self.scope.connector_table(
ValueError: Creating Kafka consumer failed: Client creation error: rdkafka consumer queue not available
Occurred here:
    Line: t = pw.io.kafka.read(
    File: /home/ubuntu/Desktop/syscall_ids/pp.py:17

What did you expect to happen?

it works fine

Version

0.7.9

Docker Versions (if used)

No response

OS

Linux

On which CPU architecture did you run Pathway?

x86-64

My kafka deploy way

kafka is in docker

version: '2'

networks:
  app-tier:
    driver: bridge

services:
  kafka:
    image: 'bitnami/kafka:latest'
    networks:
      - app-tier
    ports:
      - 9094:9094
    container_name: kafka
    hostname: kafka
    environment:
      - KAFKA_CFG_NODE_ID=0
      - KAFKA_CFG_PROCESS_ROLES=controller,broker
      - KAFKA_CFG_LISTENERS=PLAINTEXT://:9092,CONTROLLER://:9093,EXTERNAL://:9094
      - KAFKA_CFG_ADVERTISED_LISTENERS=PLAINTEXT://kafka:9092,EXTERNAL://:9094
      - KAFKA_CFG_LISTENER_SECURITY_PROTOCOL_MAP=CONTROLLER:PLAINTEXT,EXTERNAL:PLAINTEXT,PLAINTEXT:PLAINTEXT
      - KAFKA_CFG_CONTROLLER_QUORUM_VOTERS=0@kafka:9093
      - KAFKA_CFG_CONTROLLER_LISTENER_NAMES=CONTROLLER
  kafka-ui:
    image: provectuslabs/kafka-ui:latest
    container_name: kafka-ui
    hostname: kafka-ui
    networks:
      - app-tier
    ports: 
      - 19094:8080
    depends_on:
      - kafka
    environment:
      - DYNAMIC_CONFIG_ENABLED=true
      - KAFKA_CLUSTERS_0_NAME=aptcapture-syscall-channel
      - KAFKA_CLUSTERS_0_BOOTSTRAPSERVERS=kafka:9092

rdkafka config

kafka_settings = {"bootstrap.servers": "10.8.56.233:9094", "group.id": ""}
dxtrous commented 5 months ago

Hi @stairclimber,

Please double check kafka_settings configuration for the correct value of group.id (possibly the default is not empty string for your container setup; try skipping this parameter completely and/or checking what value should be filled).

Then, before we dive deeper, could you please confirm that you are able to connect to your Kafka with a test producer/consumer pair at 10.8.56.233:9094 as described here, running test scripts from the same machine as the one you are running Pathway on? https://github.com/bitnami/containers/blob/main/bitnami/kafka/README.md#producer-and-consumer-using-external-client

stairclimber commented 5 months ago

Hi @stairclimber,

Please double check kafka_settings configuration for the correct value of group.id (possibly the default is not empty string for your container setup; try skipping this parameter completely and/or checking what value should be filled).

Then, before we dive deeper, could you please confirm that you are able to connect to your Kafka with a test producer/consumer pair at 10.8.56.233:9094 as described here, running test scripts from the same machine as the one you are running Pathway on? https://github.com/bitnami/containers/blob/main/bitnami/kafka/README.md#producer-and-consumer-using-external-client

Thanks a lot, after I read rdkafka's document, I updated my code as follows, and then problems was resolved.

kafka_settings = {
    "bootstrap.servers": "kafka:9094",
    "group.id": "test",
    "enable.auto.commit": "false",
}

I think when I do not set up a group.id in my kafka bootstrap server, I should set a random string for group.id in rdkafka client, then rdkafka will works. anyway, Thanks again for your prompt reply. pathway is a awesome data processing lib, best wishes to your teams and pathway.