rsyslog / liblognorm

a fast samples-based log normalization library
http://www.liblognorm.com
GNU Lesser General Public License v2.1
99 stars 64 forks source link

[IMPROVEMENT] add separator character option to 'name-value-list' parser #335

Closed frikilax closed 3 years ago

frikilax commented 4 years ago

Description

An extradata parameter is now available for the experimental parser name-value-list, this parameter allows to set a different character as separator between key/value couples.

What's new

Before, this parser could only handle logs like a=1 b=2 c=text_without_spaces

But with this update, a parameter 'separator' can be specified to act as the matching separator between key/value couples, and a parameter 'assignator' as a separator between key and value.

for example:

rule=:%keyval:name-value-list:|%

or

rule=:%{"name": "keyval", "type": "name-value-list", "separator": "|"}%

are equivalent (by default, legacy 'extradata' is assigned to the 'separator' parameter in the first case) the matching samples will then be: a=1|b=2|c=text_without_spaces|d=text with spaces key=42|other_key=hello there|yet-another.key=i'm a value|

When 'assignator' is left by default, it will match '=' and will check for valid naming of keys (allowing only alphanumeric characters, '.', '-' and '_'). In the case someone wants to match another character by setting 'assignator', the parser will loosen the restriction and only cut keys and values by separating between 'assignator' and 'separator'. However, key name validation will NOT occur !

Rulebase with 'assignator' defined:

rule=:%{"name": ".", "type": "name-value-list", "separator": "|", "assignator":"="}%

examples of matching sample: a=1|key_two=hello world|thirdkey:=a valid value containing an \| escaped pipe||fourth#key=42 and result:

{
  "a": "1",
  "key_two": "hello world",
  "thirdkey:": "a valid value containing an \\| escaped pipe",
  "fourth#key": "42"
}

One can note character escaping will still be respected.

Credits

thanks to @KGuillemot for the help thanks to @bdolez for the original work on quoting (see #234)

davidelang commented 4 years ago

Thank you for this work, I was fighting this just the other day and thinking that I needed to dig into it again.

rather than using extradata, I would suggest using the new json support in the v2 syntax

so instead of %key:name-value-list:|% do %key:name-value-list:{"separator":"|"}%

that way we can add 'assignment' (instead of it always being '=' allow for lists like foo:bar,baz:abc)

also, can this support a quoted string as the value? This would possibly be a third parameter "valuetype"

I can also see us adding 'ignore-whitespace' as an optional parameter.

these things don't need to be added now, but rather than using 'extradata', I'd like us to use a named field to support adding the others later without the separator being such an oddball.

if this only accepts a single character separator, it should complain if you feed it more. what would it take to support longer separators?

David Lang

On Tue, 17 Mar 2020, frikilax wrote:

Date: Tue, 17 Mar 2020 03:00:59 -0700 From: frikilax notifications@github.com Reply-To: rsyslog/liblognorm reply@reply.github.com To: rsyslog/liblognorm liblognorm@noreply.github.com Cc: Subscribed subscribed@noreply.github.com Subject: [rsyslog/liblognorm] [IMPROVEMENT] add separator character option to 'name-value-list' parser (#335)

Description

An extradata parameter is now available for the experimental perser name-value-list, this parameter allows to set a different character as separator between key/value couples.

What's new

Before, this parser could only handle logs like a=1 b=2 c=text_without_spaces

But with this parameter, a character can be specified to act as the matching separator between key/value couples: rulesbase configuration:

rule=:%keyval:name-value-list:|%

or

rule=:%{"name": "keyval", "type": "name-value-list", "extradata": "|"}%

the sample: a=1|b=2|c=text_without_spaces|d=text with spaces

Details

This change only takes the first character of your extradata field, so

rule=:%keyval:name-value-list:test%

would not throw an error, but would split only on "t"

Credits

thanks to @KGuillemot for the help You can view, comment on, or merge this pull request online at:

https://github.com/rsyslog/liblognorm/pull/335

-- Commit Summary --

  • name-value-list type:: add optional separator to override default space separation between key/value instances
  • FIX:: compilation errors

-- File Changes --

M src/parser.c (46) M src/parser.h (2) M src/pdag.c (2)

-- Patch Links --

https://github.com/rsyslog/liblognorm/pull/335.patch https://github.com/rsyslog/liblognorm/pull/335.diff

frikilax commented 4 years ago

Hey @davidelang! Thanks for the insights!

We were going to give this as a simple improvement as we didn't need more yet, but after coming across some other problems, I think we'll still add support for a separator and an assignement character.

I think I will also add errors when those fields are more than 1 char, to give some feedback to the users (as it's easy/fast to add).

We also encountered problems with escaped characters, so I will also add support to ignored matched characters that are escaped.

However, quoted strings and multibytes separators might need a bit more work ,which we don't have at the moment sorry...

frikilax commented 4 years ago

quotes support has been added

bdolez commented 4 years ago

Hi,

I did a PR (https://github.com/rsyslog/liblognorm/pull/234) a long time ago on the same topic but I can no longer update it. I think my main commit is simplier and may be you could have a look at it. I think the diff I made is shorter and your features could be rebased after it. What do you think ?

Benoit

frikilax commented 4 years ago

Hi,

I did a PR (#234) a long time ago on the same topic but I can no longer update it. I think my main commit is simplier and may be you could have a look at it. I think the diff I made is shorter and your features could be rebased after it. What do you think ?

Benoit

This PR tries to achieve more than just quoting detection (the main goal wasn't quoting to begin with) so I'm not sure it's relevant to rebase this work on your PR.

bdolez commented 4 years ago

Hi, I did a PR (#234) a long time ago on the same topic but I can no longer update it. I think my main commit is simplier and may be you could have a look at it. I think the diff I made is shorter and your features could be rebased after it. What do you think ? Benoit

This PR tries to achieve more than just quoting detection (the main goal wasn't quoting to begin with) so I'm not sure it's relevant to rebase this work on your PR.

My PR is probably dead because I can't update it. I propose you take the only commit (cherry-pick) and add your feature (change sep, change assignator & config) before PR. The commit was successuly accepted&compiled and your propositions do not change algo. They will be accepted more easily.

Benoit

frikilax commented 4 years ago

Hi, I did a PR (#234) a long time ago on the same topic but I can no longer update it. I think my main commit is simplier and may be you could have a look at it. I think the diff I made is shorter and your features could be rebased after it. What do you think ? Benoit

This PR tries to achieve more than just quoting detection (the main goal wasn't quoting to begin with) so I'm not sure it's relevant to rebase this work on your PR.

My PR is probably dead because I can't update it. I propose you take the only commit (cherry-pick) and add your feature (change sep, change assignator & config) before PR. The commit was successuly accepted&compiled and your propositions do not change algo. They will be accepted more easily.

Benoit

Frankly, what is done in your PR has been added more or less in this PR, you can check this commit However, I realise we may have more or less been inspired by your PR while creating this one at the time, so I should credit you in this one

bdolez commented 4 years ago

My goal is not credit but having such feature merged into main branch and I wish it in the next release.

You treat the '\' and allow changing separator, it's better than mine.

To make it accepted, I recommand send only clean commits (merge errors & repush with --force). There is some coding-style errors. I think you sould replace "ass" variable.

In the last commit, you can group all specific "quoting" code into one "if" terminated with "goto". It make a more lisible commit.

bdolez commented 4 years ago

Hi Rainer,

As you tell us in #234 ("if you don't hear back pls ping me"), do you have some new about this PR & state of your centos7 buildbot ?

Regards

Benoit

bdolez commented 4 years ago

Hi @frikilax , any updates about this PR ?

bdolez commented 3 years ago

Hi, any updates ?

rgerhards commented 3 years ago

That PR needs a fresh push to restart the current CI. If this will not happen, I'll branch it off to a new branch, submit that as PR and see what happens. Feel free to remind me next Tuesday if nothing happend until then.

frikilax commented 3 years ago

Sorry, didn't have time to come back to this PR and completely forgot about it, I will try to get back to it before next Monday