Closed frikilax closed 3 years ago
Thank you for this work, I was fighting this just the other day and thinking that I needed to dig into it again.
rather than using extradata, I would suggest using the new json support in the v2 syntax
so instead of %key:name-value-list:|% do %key:name-value-list:{"separator":"|"}%
that way we can add 'assignment' (instead of it always being '=' allow for lists like foo:bar,baz:abc)
also, can this support a quoted string as the value? This would possibly be a third parameter "valuetype"
I can also see us adding 'ignore-whitespace' as an optional parameter.
these things don't need to be added now, but rather than using 'extradata', I'd like us to use a named field to support adding the others later without the separator being such an oddball.
if this only accepts a single character separator, it should complain if you feed it more. what would it take to support longer separators?
David Lang
On Tue, 17 Mar 2020, frikilax wrote:
Date: Tue, 17 Mar 2020 03:00:59 -0700 From: frikilax notifications@github.com Reply-To: rsyslog/liblognorm reply@reply.github.com To: rsyslog/liblognorm liblognorm@noreply.github.com Cc: Subscribed subscribed@noreply.github.com Subject: [rsyslog/liblognorm] [IMPROVEMENT] add separator character option to 'name-value-list' parser (#335)
Description
An extradata parameter is now available for the experimental perser name-value-list, this parameter allows to set a different character as separator between key/value couples.
What's new
Before, this parser could only handle logs like
a=1 b=2 c=text_without_spaces
But with this parameter, a character can be specified to act as the matching separator between key/value couples: rulesbase configuration:
rule=:%keyval:name-value-list:|%
or
rule=:%{"name": "keyval", "type": "name-value-list", "extradata": "|"}%
the sample:
a=1|b=2|c=text_without_spaces|d=text with spaces
Details
This change only takes the first character of your extradata field, so
rule=:%keyval:name-value-list:test%
would not throw an error, but would split only on "t"
Credits
thanks to @KGuillemot for the help You can view, comment on, or merge this pull request online at:
https://github.com/rsyslog/liblognorm/pull/335
-- Commit Summary --
- name-value-list type:: add optional separator to override default space separation between key/value instances
- FIX:: compilation errors
-- File Changes --
M src/parser.c (46) M src/parser.h (2) M src/pdag.c (2)
-- Patch Links --
https://github.com/rsyslog/liblognorm/pull/335.patch https://github.com/rsyslog/liblognorm/pull/335.diff
Hey @davidelang! Thanks for the insights!
We were going to give this as a simple improvement as we didn't need more yet, but after coming across some other problems, I think we'll still add support for a separator and an assignement character.
I think I will also add errors when those fields are more than 1 char, to give some feedback to the users (as it's easy/fast to add).
We also encountered problems with escaped characters, so I will also add support to ignored matched characters that are escaped.
However, quoted strings and multibytes separators might need a bit more work ,which we don't have at the moment sorry...
quotes support has been added
Hi,
I did a PR (https://github.com/rsyslog/liblognorm/pull/234) a long time ago on the same topic but I can no longer update it. I think my main commit is simplier and may be you could have a look at it. I think the diff I made is shorter and your features could be rebased after it. What do you think ?
Benoit
Hi,
I did a PR (#234) a long time ago on the same topic but I can no longer update it. I think my main commit is simplier and may be you could have a look at it. I think the diff I made is shorter and your features could be rebased after it. What do you think ?
Benoit
This PR tries to achieve more than just quoting detection (the main goal wasn't quoting to begin with) so I'm not sure it's relevant to rebase this work on your PR.
Hi, I did a PR (#234) a long time ago on the same topic but I can no longer update it. I think my main commit is simplier and may be you could have a look at it. I think the diff I made is shorter and your features could be rebased after it. What do you think ? Benoit
This PR tries to achieve more than just quoting detection (the main goal wasn't quoting to begin with) so I'm not sure it's relevant to rebase this work on your PR.
My PR is probably dead because I can't update it. I propose you take the only commit (cherry-pick) and add your feature (change sep, change assignator & config) before PR. The commit was successuly accepted&compiled and your propositions do not change algo. They will be accepted more easily.
Benoit
Hi, I did a PR (#234) a long time ago on the same topic but I can no longer update it. I think my main commit is simplier and may be you could have a look at it. I think the diff I made is shorter and your features could be rebased after it. What do you think ? Benoit
This PR tries to achieve more than just quoting detection (the main goal wasn't quoting to begin with) so I'm not sure it's relevant to rebase this work on your PR.
My PR is probably dead because I can't update it. I propose you take the only commit (cherry-pick) and add your feature (change sep, change assignator & config) before PR. The commit was successuly accepted&compiled and your propositions do not change algo. They will be accepted more easily.
Benoit
Frankly, what is done in your PR has been added more or less in this PR, you can check this commit However, I realise we may have more or less been inspired by your PR while creating this one at the time, so I should credit you in this one
My goal is not credit but having such feature merged into main branch and I wish it in the next release.
You treat the '\' and allow changing separator, it's better than mine.
To make it accepted, I recommand send only clean commits (merge errors & repush with --force). There is some coding-style errors. I think you sould replace "ass" variable.
In the last commit, you can group all specific "quoting" code into one "if" terminated with "goto". It make a more lisible commit.
Hi Rainer,
As you tell us in #234 ("if you don't hear back pls ping me"), do you have some new about this PR & state of your centos7 buildbot ?
Regards
Benoit
Hi @frikilax , any updates about this PR ?
Hi, any updates ?
That PR needs a fresh push to restart the current CI. If this will not happen, I'll branch it off to a new branch, submit that as PR and see what happens. Feel free to remind me next Tuesday if nothing happend until then.
Sorry, didn't have time to come back to this PR and completely forgot about it, I will try to get back to it before next Monday
Description
An extradata parameter is now available for the experimental parser name-value-list, this parameter allows to set a different character as separator between key/value couples.
What's new
Before, this parser could only handle logs like
a=1 b=2 c=text_without_spaces
But with this update, a parameter 'separator' can be specified to act as the matching separator between key/value couples, and a parameter 'assignator' as a separator between key and value.
for example:
or
are equivalent (by default, legacy 'extradata' is assigned to the 'separator' parameter in the first case) the matching samples will then be:
a=1|b=2|c=text_without_spaces|d=text with spaces
key=42|other_key=hello there|yet-another.key=i'm a value|
When 'assignator' is left by default, it will match '=' and will check for valid naming of keys (allowing only alphanumeric characters, '.', '-' and '_'). In the case someone wants to match another character by setting 'assignator', the parser will loosen the restriction and only cut keys and values by separating between 'assignator' and 'separator'. However, key name validation will NOT occur !
Rulebase with 'assignator' defined:
examples of matching sample:
a=1|key_two=hello world|thirdkey:=a valid value containing an \| escaped pipe||fourth#key=42
and result:One can note character escaping will still be respected.
Credits
thanks to @KGuillemot for the help thanks to @bdolez for the original work on quoting (see #234)