openconfig / gnmic

gNMIc is a gNMI CLI client and collector
https://gnmic.openconfig.net
Apache License 2.0
190 stars 57 forks source link

gnmic -> aws msk (kafka) -> apache druid #84

Closed richardeaxon closed 1 year ago

richardeaxon commented 1 year ago

Fantastic project!

Having some issues getting data through into apache druid. The data arrives just fine, but druid cannot handle batched messages, or any json that is not an object. Sending gnmic data in json format allows me to process the messages in druid, but I loose all the event format goodness of gnmic (like jq, event processors etc.). Is there a way with gnmic to de-batch messages and send exactly one event per message? I am aware message transformations can be done on the kafka side, but this is not an option at this stage.

An example of an event message generated by gnmic is (its an array of events in one message):

[
  {
    "timestamp": 1680417104877454555,
    "tags": {
      "interface_name": "Ethernet34/1",
      "source": "node1:6030"
    },
    "values": {
      "/interfaces/interface/state/counters/in-octets": 105458054050005
    }
  },
  {
    "timestamp": 1680417104877454555,
    "tags": {
      "interface_name": "Ethernet34/1",
      "source": "node1:6030"
    },
    "values": {
      "/interfaces/interface/state/counters/in-unicast-pkts": 135224098913
    }
  }
]

What I need is the array exploded into one message per event:

message 1:
  {
    "timestamp": 1680417104877454555,
    "tags": {
      "interface_name": "Ethernet34/1",
      "source": "node1:6030"
    },
    "values": {
      "/interfaces/interface/state/counters/in-octets": 105458054050005
    }
  }

message 2:
  {
    "timestamp": 1680417104877454555,
    "tags": {
      "interface_name": "Ethernet34/1",
      "source": "node1:6030"
    },
    "values": {
      "/interfaces/interface/state/counters/in-unicast-pkts": 135224098913
    }
  }

Ideally the "values" should be exploded too, but when using event output format I have not seen multiple values per event (although the key is "values" it's an object, not an array so it should hopefully be ok).

karimra commented 1 year ago

Would you be ok with a processor that splits the event into multiple events with a single message each? (possible using starlark)

The event will still be a list of messages (of length 1) Meaning:

[
  {
    "timestamp": 1680417104877454555,
    "tags": {
      "interface_name": "Ethernet34/1",
      "source": "node1:6030"
    },
    "values": {
      "/interfaces/interface/state/counters/in-octets": 105458054050005
    }
  },
  {
    "timestamp": 1680417104877454555,
    "tags": {
      "interface_name": "Ethernet34/1",
      "source": "node1:6030"
    },
    "values": {
      "/interfaces/interface/state/counters/in-unicast-pkts": 135224098913
    }
  }
]

becomes

# event1:
[
  {
    "timestamp": 1680417104877454555,
    "tags": {
      "interface_name": "Ethernet34/1",
      "source": "node1:6030"
    },
    "values": {
      "/interfaces/interface/state/counters/in-octets": 105458054050005
    }
  }
]

# event2:
[
  {
    "timestamp": 1680417104877454555,
    "tags": {
      "interface_name": "Ethernet34/1",
      "source": "node1:6030"
    },
    "values": {
      "/interfaces/interface/state/counters/in-unicast-pkts": 135224098913
    }
  }
]

It's possible to get multiple values in a single message when using JSON or JSON_IETF encoding and the router bundles multiple leaves in a single JSON value.

You can separate values into multiple messages with the starlark processor.

richardeaxon commented 1 year ago

Starlark is nice, but unfortunately Apache druid insists on objects only, not lists/arrays.

jrametta commented 1 year ago

this would be useful, currently we are using a stream processor in kafka to remove the brackets prior to ingesting

karimra commented 1 year ago

I can add a config attribute to send the event messages individually instead of together as an array. The config attribute would be under each output (file, kafka, nats, jetstream, tcp and udp) section not as a processor.