rsyslog / liblognorm

a fast samples-based log normalization library
http://www.liblognorm.com
GNU Lesser General Public License v2.1
99 stars 64 forks source link

UTF-8 accentuated characters causing segfault #340

Open Rfferrao87 opened 4 years ago

Rfferrao87 commented 4 years ago

Hi, I'd like to know if this is an isolated case, but some logs with utf-8 characters have been breaking lognormalizer and, consequently, rsyslog for me.

Here are my tests:

echo 'msg="Supervisão"' | lognormalizer -r sample.rb -vvv

liblognorm: loading rulebase file 'sample.rb'
liblognorm: rulebase version is 2

liblognorm: read rulebase line[~3]: 'rule=:msg=%msg:string%'
liblognorm: rule line to add: ':msg=%msg:string%'
liblognorm: addSampToTree 0 of 16
liblognorm: parsed literal: 'msg='
liblognorm: ln_pdagAddParserInternal: { "type": "literal", "text": "m" }
liblognorm: ln_pdagAddParserInstance: { "type": "literal", "text": "m" }, nextnode (nil)
liblognorm: assigned priority is 30000
liblognorm: pdag: 0x7380a0, parser 0x738600
liblognorm: ln_pdagAddParserInternal: { "type": "literal", "text": "s" }
liblognorm: ln_pdagAddParserInstance: { "type": "literal", "text": "s" }, nextnode (nil)
liblognorm: assigned priority is 30000
liblognorm: pdag: 0x7386b0, parser 0x738600
liblognorm: ln_pdagAddParserInternal: { "type": "literal", "text": "g" }
liblognorm: ln_pdagAddParserInstance: { "type": "literal", "text": "g" }, nextnode (nil)
liblognorm: assigned priority is 30000
liblognorm: pdag: 0x738880, parser 0x7385c0
liblognorm: ln_pdagAddParserInternal: { "type": "literal", "text": "=" }
liblognorm: ln_pdagAddParserInstance: { "type": "literal", "text": "=" }, nextnode (nil)
liblognorm: assigned priority is 30000
liblognorm: pdag: 0x7387a0, parser 0x738a90
liblognorm: parsed field: 'msg'
liblognorm: field type 'string', i 15
liblognorm: ln_pdagAddParserInternal: { "name": "msg", "type": "string" }
liblognorm: ln_pdagAddParserInstance: { "name": "msg", "type": "string" }, nextnode (nil)
liblognorm: assigned priority is 30000
liblognorm: pdag: 0x738a40, parser 0x738a90
liblognorm: parsed literal: ''
liblognorm: end addSampToTree 16 of 16
liblognorm: optimizing main pdag component
liblognorm: pre sort, parser 0:(null)[7680004]
liblognorm: post sort, parser 0:(null)[7680004]
liblognorm: optimizing 0x7386b0: field 0 type 'literal', name '(null)': 'm':
liblognorm: opt path compact: add 0x738560 to 0x7388d0
liblognorm: delete 0x7386b0[1]: (null)
liblognorm: opt path compact: add 0x738600 to 0x7388d0
liblognorm: delete 0x738880[1]: (null)
liblognorm: opt path compact: add 0x738b20 to 0x7388d0
liblognorm: delete 0x7387a0[1]: (null)
liblognorm: pre sort, parser 0:msg[7680032]
liblognorm: post sort, parser 0:msg[7680032]
liblognorm: optimizing 0x738cb0: field 0 type 'string', name 'msg': 'UNKNOWN':
liblognorm: finished optimizing main pdag component
liblognorm: ---AFTER OPTIMIZATION------------------
liblognorm: MAIN COMPONENT:
liblognorm: subDAG 0x7380a0 (children: 1 parsers, ref 1) [called 0, backtracked 0]
liblognorm: field type 'literal', name '(null)': 'msg=': called 0
liblognorm: field type 'literal', name '(null)': 'msg=':
liblognorm:   subDAG 0x738a40 (children: 1 parsers, ref 1) [called 0, backtracked 0]
liblognorm:   field type 'string', name 'msg': 'UNKNOWN': called 0
liblognorm:   field type 'string', name 'msg': 'UNKNOWN':
liblognorm:     subDAG [TERM] 0x738cb0 (children: 0 parsers, ref 1) [called 0, backtracked 0]
liblognorm: MAIN COMPONENT (alternative):
liblognorm: 0x7380a0[ref 1]:
liblognorm:   0x738a40[ref 1]: msg=
liblognorm:     0x738cb0[ref 1]: msg=%msg:string%
liblognorm: =======================================
number of tree nodes: 6
liblognorm: MAIN COMPONENT:
liblognorm: subDAG 0x7380a0 (children: 1 parsers, ref 1) [called 0, backtracked 0]
liblognorm: field type 'literal', name '(null)': 'msg=': called 0
liblognorm: field type 'literal', name '(null)': 'msg=':
liblognorm:   subDAG 0x738a40 (children: 1 parsers, ref 1) [called 0, backtracked 0]
liblognorm:   field type 'string', name 'msg': 'UNKNOWN': called 0
liblognorm:   field type 'string', name 'msg': 'UNKNOWN':
liblognorm:     subDAG [TERM] 0x738cb0 (children: 0 parsers, ref 1) [called 0, backtracked 0]
liblognorm: MAIN COMPONENT (alternative):
liblognorm: 0x7380a0[ref 1]:
liblognorm:   0x738a40[ref 1]: msg=
liblognorm:     0x738cb0[ref 1]: msg=%msg:string%
To normalize: 'msg="Supervisão"'
liblognorm: 0: enter parser, dag node 0x7380a0, json 0x738910
liblognorm: 0/0:trying 'literal' parser for field '(null)', data 'msg='
liblognorm: parser lookup returns 0, pParsed 4
liblognorm: 0: potential hit, trying subtree 0x738a40
liblognorm: 4: enter parser, dag node 0x738a40, json 0x738910
liblognorm: 4/0:trying 'string' parser for field 'msg', data 'UNKNOWN'
Segmentation fault (core dumped)

The rulebase contents are the following:

version=2

rule=:msg=%msg:string%

Can you help me figure this out?