openconfig / gnmic

gNMIc is a gNMI CLI client and collector
https://gnmic.openconfig.net
Apache License 2.0
187 stars 57 forks source link

"delete" messages not hitting processors #211

Closed gsl-rosst closed 1 year ago

gsl-rosst commented 1 year ago

I want to filter out "delete" events, but no processor I've tried has matched them. I even wrote a Starklark processor that prints out a particular tag as well as the event.deletes object, and 1) I never see the tag corresponding to the deletion I'm triggering, and 2) the event.deletes object printed is always [].

I suspect that delete events simply bypass the processors somehow? This is a big issue for me as the code receiving these events is expecting a "values" field. Please, let me know if there's a way to filter them.

Thanks.

karimra commented 1 year ago

Can you share the config file you are using? How do the event messages with deletes look like if you write them to a file output?

gsl-rosst commented 1 year ago

I have used probably about 100 different config files at this point.

I am sending everything to kafka. I've set up a test topic where I am using kcat and grep to print relevant messages. The messages look like this:

{"name":"x","timestamp":1693291893098256936,"tags":{"source":"x","subscription-name":"x"},"deletes":["path[name=UniqueNameOfThingThatIsDeleted]"]}

a normal message looks like this:

{"name":"x","timestamp":1690517596717884351,"tags":{"identifier":"x","source":"x","subscription-name":"x"},"values":{"x":0}}

I have not been able to find a filter that removes the former messages. I tried event-allow/event-drop with conditions using jq, I tried event-jq, and most recently I tried event-starlark. The latter allows me to print out information about each event which is how I determined that the delete event is not even hitting the processor.

karimra commented 1 year ago

Could you share an example of a processor you used that you think should be working but is not ? If you can print messages with deletes using a starlark processor, that means the event is hitting the processor... it's just that the condition to skip/delete it is not right.

gsl-rosst commented 1 year ago

Could you share an example of a processor you used that you think should be working but is not ?

Some examples:

  no_deletes:
    event-drop:
      condition: has("deletes")
  no_deletes:
    event-drop:
      condition: (.deletes | length) > 0
  require_values:
    event-allow:
      condition: has("values")

I'm recreating these from memory - I tried various combinations of quotes and no quotes as well.

If you can print messages with deletes using a starlark processor, that means the event is hitting the processor...

Indeed, but I can't. Every message I print has an empty array of deletes. None of the events that actually have deletes show up in the output.

EDIT: It would be really helpful if there were some more logging for these processors. As it is, they either work or they don't, and it's very difficult to troubleshoot them if they don't.

gsl-rosst commented 1 year ago

Thanks, this is working.

karimra commented 1 year ago

Thanks for testing before release, this is helpful.

karimra commented 1 year ago

Would it make sense for your use case to change the deletes format to be a list of paths without keys and move the keys to tags? Just like it's done for updates (values). Of course, in that case a gNMI notification with both Update and Delete will have to be split into multiple event messages so that tags don't overlap.

gsl-rosst commented 1 year ago

I do think that makes more sense/is more consistent, but it doesn't really make a difference to me as I'm not processing the deletes. If a component goes away, I still need to know the historical values of it, so I don't discard data or take any action on delete.

The reason this came up at all was that I'm ingesting this data into a database, and the JSON parsing failed if there was no "values" field. The database would continue failing on that message, the kafka queue would back up, events would eventually drop...so now that I can filter them out it's not an issue.

gsl-rosst commented 1 year ago

Not related but also worth noting - when I compiled and ran the latest from main, I had the unpleasant surprise that float values are now returned as an actual float instead of int precision and int digits (9c0a1de). This was a breaking change and forced me to change a good deal of database schema. It would be good if the release notes mentioned breaking changes like this, would have saved me some time tracking it down.

karimra commented 1 year ago

Not related but also worth noting - when I compiled and ran the latest from main, I had the unpleasant surprise that float values are now returned as an actual float instead of int precision and int digits (9c0a1de). This was a breaking change and forced me to change a good deal of database schema. It would be good if the release notes mentioned breaking changes like this, would have saved me some time tracking it down.

Yes, my bad I don't write changelogs for minor releases. I probably should...

gsl-rosst commented 1 year ago

At least if there's a breaking change :)

gsl-rosst commented 1 year ago

Actually - I did think of a use-case for delete. For descriptions, when they are removed (I think) it comes through as a delete. In that case I would want to positively set that value in the database as an empty string. In such a case, it would be helpful if the keys were in the tags like they are for updates.