thingsboard / tbmq

Open-source, scalable, and fault-tolerant MQTT broker able to handle 4M+ concurrent client connections, supporting at least 3M messages per second throughput per single cluster node with low latency delivery. The cluster mode supports more than 100M concurrently connected clients.
https://thingsboard.io/products/mqtt-broker/
Apache License 2.0
572 stars 46 forks source link

[Question] Application Shared Subscriptions #88

Closed gardnerd closed 9 months ago

gardnerd commented 9 months ago

Component

Description My team is running into some issues with TBMQ around shared subscriptions (or at least how understand they are supposed to be working).

I have a Shared Subscription created to handle all topics, using "#" as the MQTT filter. It's configured with two partitions and have two Application clients connected to the shared subscription "$share/all-messages/#", both with persistent connections. The kafka topic seems to be created properly and the consumer groups look correct, but it doesn't look like the broker is sending the messages round robin. Once I connect one client, it receives all the messages until I connect the second client after which the messages switch over to that newly connected client. Basically it's only sending messages to one client at a time. If I disconnect one of the clients the messages then move over to the other client that is connected, so it's working as a failover, but I can't seem to get the round robin working.

So in short, are there any configuration steps I need to set to get the round robin working with Application clients or am I missing something with how to set up the shared subscriptions?

Environment

dmytro-landiak commented 9 months ago

hey @gardnerd !

Thank you for the raised question and the greatly explained issue.

Based on the details you provided, it seems to me that the configuration you have done as well as the shared subscription setup for Application clients is correct. The reason for such behavior is Kafka itself. The Kafka producer is configured inside TBMQ that sends messages from MQTT clients (mqtt publishers) to the Kafka topic with partition = null and key = null. This logic is applied for publishing messages for Application shared subscription topics only. The default partitioner (see below) is used for all the producers. Here are some details about how it is working:

image The link: https://redpanda.com/guides/kafka-tutorial/kafka-partition-strategy

Here the explanation is a bit more precise. https://docs.confluent.io/platform/current/installation/configuration/producer-configs.html#partitioner-class

The important part from that is - "If no partition or key is present, choose the sticky partition that changes when at least batch.size bytes are produced to the partition."

Since the Kafka topic has 2 partitions and you connect 2 App clients - each of them is assigned with 1 partition. For a better control and view I can recommend deploying to the cluster the Kafka UI. Here is the yaml definition.

apiVersion: apps/v1
kind: StatefulSet
metadata:
  name: tb-broker-kafka-ui
spec:
  serviceName: tb-broker-kafka-ui
  replicas: 1
  podManagementPolicy: Parallel
  selector:
    matchLabels:
      app: tb-broker-kafka-ui
  template:
    metadata:
      labels:
        app: tb-broker-kafka-ui
    spec:
      containers:
        - name: server
          imagePullPolicy: IfNotPresent
          image: docker.redpanda.com/redpandadata/console:latest
          resources:
            requests:
              cpu: 200m
              memory: 200Mi
            limits:
              cpu: 200m
              memory: 200Mi
          ports:
            - containerPort: 8080
              name: http8080
          env:
            - name: KAFKA_BROKERS
              value: "PUT_KAFKA_URL_HERE"
      restartPolicy: Always

Note: replace PUT_KAFKA_URL_HERE with actual url.

After it is deployed you can port-forward it to open in the browser:

kubectl port-forward tb-broker-kafka-ui-0 8080:8080

What can be interesting for you is how partitions receive messages: image

I can reproduce such behavior as well. It is even more reproducible if the message rate is not so high. But when it is high enough - the messages begin to come into both partitions.

Anyway, we will consider such a behavior in the upcoming releases and will try to make it work smoother.

If you have any more questions, please let me know.

Best regards!

p.s. From Kafka official docs: image

gardnerd commented 9 months ago

@dmytro-landiak Thank you so much for the quick response and explanation. The partition balancing coming from Kafka makes perfect sense and I'll keep that in mind moving forward. I was able to reproduce the behavior you described when there's a high message rate as well. In my testing I was only using a low rate which you were correct to assume. I was also able to get closer to a round robin behavior with a low message rate by changing the linger-ms and batch-size settings for the producer.