redpanda-data / console

Redpanda Console is a developer-friendly UI for managing your Kafka/Redpanda workloads. Console gives you a simple, interactive approach for gaining visibility into your topics, masking data, managing consumer groups, and exploring real-time data with time-travel debugging.
https://redpanda.com
3.79k stars 347 forks source link

Attempting to read protobufs from the file-system twice #336

Closed JSchulte01 closed 2 years ago

JSchulte01 commented 2 years ago

Kowl is unable to read protobufs from the file system appropriately - It appears to find 2 protobuf files, despite there being only 1 in the directory. I have confirmed there is exactly one protobuf file available and yet it still tries to read two and raises an error about duplicate symbols.

Logs

{"level":"info","ts":"2022-03-30T03:52:25.281Z","msg":"successfully loaded all files from filesystem into cache","source":"file_provider","loaded_files":2}
5
{"level":"debug","ts":"2022-03-30T03:52:25.281Z","msg":"fetched .proto files from filesystem service cache","fetched_proto_files":2}
4
{"level":"warn","ts":"2022-03-30T03:52:25.281Z","msg":"failed to parse proto file to descriptor","file":"sparkplug.proto","line":5,"error":"sparkplug.proto:5:1: duplicate symbol example.Payload: already defined as message in \"..2022_03_30_03_52_18.309866663/sparkplug.proto\""}
3
{"level":"warn","ts":"2022-03-30T03:52:25.281Z","msg":"failed to parse proto file to descriptor","file":"sparkplug.proto","line":6,"error":"sparkplug.proto:6:5: duplicate symbol example.Payload.timestamp: already defined as field in \"..2022_03_30_03_52_18.309866663/sparkplug.proto\""}
2
{"level":"warn","ts":"2022-03-30T03:52:25.281Z","msg":"failed to parse proto file to descriptor","file":"sparkplug.proto","line":7,"error":"sparkplug.proto:7:5: duplicate symbol example.Payload.value: already defined as field in \"..2022_03_30_03_52_18.309866663/sparkplug.proto\""}
1
{"level":"fatal","ts":"2022-03-30T03:52:25.281Z","msg":"failed to start kafka service","error":"failed to create proto registry: failed to compile proto files to descriptors: failed to parse proto files to descriptors: parse failed: invalid proto source"}

Config

      protobuf:
        enabled: true
        mappings:
          - topicName: test.proto
            valueProtoType: example.Payload
      #     # Map the proto type names for each of your topics. These proto types will be used for deserialization
      #     # - topicName: xy
      #     #   valueProtoType: fake_model.Order # You can specify the proto type for the record key and/or value (just one will work too)
      #     #   keyProtoType: package.Type
      #   # SchemaRegistry does not require any mappings to be specified. The schema registry client configured on the kafka level will be reused.
      #   schemaRegistry:
      #     enabled: true
      #     refreshInterval: 1m
      #   # FileSystem can be configured if you want Kowl to search the local file system for the .proto files
        fileSystem:
          enabled: true
          paths:
            - /mnt/schemas
          refreshInterval: 60m
JSchulte01 commented 2 years ago

Version Details Kowl CI fa13b2d (built March 23, 2022)

weeco commented 2 years ago

Hey @JSchulte01 , this is unfortunately not so easy debug, could you provide some proto files + Kowl config so that I can reproduce the issue? Then I can debug this locally.

Thanks, Martin

JSchulte01 commented 2 years ago

I've not been able to create a reproducible example in docker locally - this example is hosted in a kubernetes cluster - that said I can reliably reproduce the error in k8s with the following configuration:

Deployment.yaml

apiVersion: apps/v1
kind: Deployment
metadata:
  name: kafka-dashboard
  labels:
    name: kafka-dashboard
    app: kafka-dashboard
spec:
  selector:
    matchLabels:
      name: kafka-dashboard
  revisionHistoryLimit: 1
  replicas: 1
  strategy:
    type: Recreate
  template:
    metadata:
      labels:
        name: kafka-dashboard
        app: kafka-dashboard
    spec:
      terminationGracePeriodSeconds: 30
      volumes:
        - name: config
          configMap: 
            name: kafka-dashboard-cfg
        - name: schemas
          configMap: 
            name: protobuf-schemas-cfg
      containers:
        - name: kowl-kafka-dashboard
          image: 'quay.io/cloudhut/kowl:master'
          imagePullPolicy: Always
          ports:
            - name: http
              containerPort: 8080
              protocol: TCP
          volumeMounts:
            - name: config
              mountPath: /mnt/config
            - name: schemas
              mountPath: /mnt/schemas
          env:
            - name: CONFIG_FILEPATH
              value: /mnt/config/kowl-config.yaml

ConfigMap

apiVersion: v1
kind: ConfigMap
metadata:
  name: protobuf-schemas-cfg
data:
  sparkplug.proto: |
    syntax = "proto3";

    package maxedge;

    message Payload {
      uint64 timestamp = 1;
      string test = 2;
    }

kowl.yaml

      protobuf:
        enabled: true
        mappings:
          - topicName: test
            valueProtoType: maxedge.Payload
      #     # Map the proto type names for each of your topics. These proto types will be used for deserialization
      #     # - topicName: xy
      #     #   valueProtoType: fake_model.Order # You can specify the proto type for the record key and/or value (just one will work too)
      #     #   keyProtoType: package.Type
      #   # SchemaRegistry does not require any mappings to be specified. The schema registry client configured on the kafka level will be reused.
      #   schemaRegistry:
      #     enabled: true
      #     refreshInterval: 1m
      #   # FileSystem can be configured if you want Kowl to search the local file system for the .proto files
        fileSystem:
          enabled: true
          paths:
            - /mnt/schemas
          refreshInterval: 60m
JSchulte01 commented 2 years ago

It would seem that this is related to K8s versioning of configMap mounts using symlinks

image

There is a solution - though not ideal which involves mapping each file individually: https://stackoverflow.com/questions/50685385/kubernetes-config-map-symlinks-data-is-there-a-way-to-avoid-them

weeco commented 2 years ago

Thanks for the follow up and detailed information - highly appreciated!

smatvienko-tb commented 1 year ago

Hey, @weeco! The proto-decoding feature is great! The issue is still there. The files were imported twice in Kubernetes, and it was painful to map config map with many photos.

The problem and root cause are shown in the picture: The root cause is scanning hidden folders.

image

Steps to reproduce on any Linux environment:

  1. Create a hidden folder /tmp/protos/..data
  2. Put a proto file to the folder /tmp/protos/..data/tbmsg.proto
  3. Make a symlink /tmp/protos/tbmsg.proto to /tmp/protos/..data/tbmsg.proto
kafka:
  protobuf:
    enabled: true
    fileSystem:
      enabled: true
      paths:
        - /tmp/protos

The fix is to avoid scanning hidden files and folders (same as ls command do).

Otherwise, the Kubernetes deployment became painful. Some users might give up using the proto-decoding feature because of such an issue.

I am preparing the Redpanda console deployment for the ThingsBoard IoT framework and I don't really want to submit a non-perfect deployment for a wide community. Unfortunately, I can not help you with a good PR because it is not my language at this moment. Let me know if you need to create a new issue. Will be appreciated if you can implement the fix required for Kubernetes users 🙏 .

@pon0marev FYI

image

weeco commented 1 year ago

Hey @smatvienko-tb , thanks for your detailed comment. This helps me understanding the issue.

I think we could add a config option that skips hidden dirs (requires an OS specific implementation). Would that work for you? Possibly I'll make that an opt-in to not break existing deployments.

smatvienko-tb commented 1 year ago

Hey @weeco, Thank you for your quick response!

For me, any solution will be just fine!

But I do not believe someone intentionally injected a hidden file or nested directory as a proto. That has no clear purpose.

The hidden files are implemented in any OS. And I believe that will be not too much headache to implement this. Thank you!