vectordotdev / vrl

Vector Remap Language
Mozilla Public License 2.0
125 stars 57 forks source link

Add `parse_influxdb` function #976

Closed jorgehermo9 closed 3 weeks ago

jorgehermo9 commented 1 month ago

With the merge of https://github.com/vectordotdev/vector/pull/19637, we thought of using that decoder instead of a custom lua one (which takes a lot of cpu), but it seems that we have malformed line protocol messages and using the decoding.codec=influxdb option in sources does allow to route the malformed messages somewhere else, as the data is being dropped.

The alternative we see is to use a remap transform with a parse_influxdb function and drop_on_abort=true, so we can route the malformed data (<component_id>.dropped) to our custom lua decoder which is less strict.

In order to handle this failing line protocol messages, we think that a parse_influxdb vrl function is needed. The implementation should be very similar to https://github.com/vectordotdev/vector/blob/210ff0925d391213556f07bf6ce621967f0368ca/lib/codecs/src/decoding/format/influxdb.rs#L97

Doubt: The source decoder option is decoding.codec=influxdb and not decoding.codec=line_protocol, shoud we call this function parse_influxdb in order to be consistent with the vector option? or should we change the vector config spec to use decoding.codec=line_protocol?

jszwedko commented 1 month ago

Agreed, most of our codecs have analogues in VRL for cases where people want more control. We've also previously discussed having sources be able to route events that fail codec parsing to another output, which I think would also help here, but that is a bigger change and I think we'd want this VRL function still anyway.

I think we should call this parse_influxdb to match the codec name.

The existing parse_* functions can be used as an example if you are anyone else wants to take a shot at this.

jorgehermo9 commented 1 month ago

We've also previously discussed having sources be able to route events that fail codec parsing to another output,

Yes! that is what I was initially looking for, and I think it would be a very useful feature as we wouldn't have to use this additional remap transform.

The existing parse_* functions can be used as an example if you are anyone else wants to take a shot at this.

We are very interested in this feature, so I could address it by myself soon. I took a look to the others parse_* functions and it does not seem too complicated to glue the influxdb_line_protocol crate in it

Thanks!!