vectordotdev / vector

A high-performance observability data pipeline.
https://vector.dev
Mozilla Public License 2.0
17.85k stars 1.58k forks source link

parse_groks / parse_grok inconsistent behavior wrt datadog #13423

Open shaeqahmed opened 2 years ago

shaeqahmed commented 2 years ago

A note for the community

Problem

Vector's parse_grok/parse_groks functionality seems to behave similar to Datadog's, but for a basic example copied from the docs, the result seems unxpected.

Log

user=john connect_date=11/08/2017 id=123 action=click

Vector output (using VRL cli)

$ parse_groks!("user=john connect_date=11/08/2017", ["%{data::keyvalue}"])

{  }

Datadog output

{
  "user": "john",
  "id": 123,
  "action": "click"
}

Expected output

{
  "user": "john",
  "connect_date": "11/08/2017",    <--- not sure about this one, maybe it should be dropped like Datadog does
  "id": 123,
  "action": "click"
}

Pretty sure this is a bug, as removing the connected_date field fixes the issue with the keyval parser, and dropping all the fields seems odd? I am new to Vector

Configuration

No response

Version

vector 0.22.2 (x86_64-apple-darwin 0024c92 2022-06-15)

Debug Output

No response

Example Data

user=john connect_date=11/08/2017 id=123 action=click

Additional Context

No response

References

No response

neuronull commented 2 years ago

Hi @shaeqahmed , I believe the forward slashes inside the connection_date value need to be whitelisted in the grok pattern.

This seemed to work for me (in case you were blocked by this):

$ parse_groks!("user=john connect_date=11/08/2017 id=123 action=click", [s'%{data::keyvalue("=", "/:")}'])
{ "action": "click", "connect_date": "11/08/2017", "id": 123, "user": "john" }

However, your point about vector dropping all fields where datadog is only dropping the connect_date field is still valid.

It seems there are other cases where the vector behavior does not match that of datadog:

The datadog example of setting custom separator does not work as-is for vector:

parse_groks!("user: john connect_date: 11/08/2017 id: 123 action: click", [s'%{data::keyvalue(": ")}'])

But this modification to it does:

parse_groks!("user: john connect_date: 11/08/2017 id: 123 action: click", [s'%{data::keyvalue(": ", "/")}'])
shaeqahmed commented 2 years ago

Discovered another issue with parse_groks:

> vector --version
vector 0.24.1 (x86_64-apple-darwin 8935681 2022-09-12)

> vector vrl
...
$ parse_groks!("4127 Register", ["%{NUMBER:.zeek.sip.sequence.number}"])
function call error for "parse_groks" at (0:70): unable to parse grok: value does not match any rule

$ parse_grok!("4127 Register", "%{NUMBER:.zeek.sip.sequence.number}")
{ ".zeek.sip.sequence.number": "4127" }

I believe parse_grok was migrated internally to call parse_groks with a single [pattern], so this seems like a bug for one to work and not the other. Also, isn't the parse_groks fn supposed to default to nested values, so this should return something like e.g.:

{ ".zeek.": { "sip": {"sequence": { "number": "4127" } } } }

@neuronull can you please take a look? Thanks!

neuronull commented 2 years ago

Discovered another issue with parse_groks:

@neuronull can you please take a look? Thanks!

Hi @shaeqahmed ! Thanks for flagging this... would you mind opening a new issue with these details and we will get that triaged?

I believe this original issue you filed still has standalone value for tracking this problem:

However, your point about vector dropping all fields where datadog is only dropping the connect_date field is still valid.

shaeqahmed commented 2 years ago

Gotcha, thanks I'll open a separate ticket

uksza commented 1 month ago

Hi, another examples of inconsistency:

vector -V
vector 0.41.1 (x86_64-unknown-linux-gnu 745babd 2024-09-11 14:55:36.802851761)

$ parse_grok!("/abc-arst-11323-arstars/err.txt", "/(?<test01>.[^/]*)/?")
{ "test01": "abc-arst-11323-arstars" }

$ parse_groks!("/abc-arst-11323-arstars/err.txt", patterns:[ "/(?<test01>.[^/]*)/?" ] )
function call error for "parse_groks" at (0:85): unable to parse grok: value does not match any rule

$ parse_groks!( "/123123/", patterns: [ "/%{NUMBER:nn:integer}/"]  )
{ "nn": 123123 }

$ parse_grok!( "/123123/", "/%{NUMBER:nn:integer}/")
{ "nn:integer": "123123" }