rsyslog / liblognorm

a fast samples-based log normalization library
http://www.liblognorm.com
GNU Lesser General Public License v2.1
99 stars 64 forks source link

Multiple ln_loadSamples calls. #206

Closed beave closed 8 years ago

beave commented 8 years ago

I'm in the process of getting Sagan functional with liblognorm 1.1.3 and github.com dev tree. I noticed something this evening that I am not sure is an API change or a bug.

In the past with Sagan, I've been able to call ln_loadSamples() multiple times to load multiple sample files. This allows me to "break up" samples by there types and load them only when they are needed.

It appears that this is no longer possible. It appears that only the last ln_loadSamples() data remains in memory.

Hopefully this makes sense. The code in question is pretty simple and straight forward. Below is a link to the section in question.

https://github.com/beave/sagan/blob/master/src/sagan-liblognorm.c#L92-L102

Let me know your thoughts.

davidelang commented 8 years ago

On Sat, 7 May 2016, Champ Clark wrote:

I'm in the process of getting Sagan functional with liblognorm 1.1.3 and github.com dev tree. I noticed something this evening that I am not sure is an API change or a bug.

In the past with Sagan, I've been able to call ln_loadSamples() multiple times to load multiple sample files. This allows me to "break up" samples by there types and load them only when they are needed.

It appears that this is no longer possible. It appears that only the last ln_loadSamples() data remains in memory.

Hopefully this makes sense. The code in question is pretty simple and straight forward. Below is a link to the section in question.

https://github.com/beave/sagan/blob/master/src/sagan-liblognorm.c#L92-L102

Let me know your thoughts.

I believe that this was a deliberate change in liblognorm v2

There is (or is going to be) an include directive, but all samples must be loaded at once.

David Lang

rgerhards commented 8 years ago

2016-05-08 6:04 GMT+02:00 Champ Clark notifications@github.com:

I'm in the process of getting Sagan functional with liblognorm 1.1.3 and github.com dev tree. I noticed something this evening that I am not sure is an API change or a bug.

In the past with Sagan, I've been able to call ln_loadSamples() multiple times to load multiple sample files. This allows me to "break up" samples by there types and load them only when they are needed.

Sorry to say that, but that was probably a side-effect of the old implementation. I wasn't really aware of that ability. It is an important property of the new data structure (the parse DAG, PDAG) that it is immutable after load. If we need this "load multiple single file" funtionality, we could probably craft a new API which does so, BUT then we need a "finishLoad" function, that will do all necessary work to compile the runtime form, and after it everything will be read only.

Rainer

It appears that this is no longer possible. It appears that only the last ln_loadSamples() data remains in memory.

Hopefully this makes sense. The code in question is pretty simple and straight forward. Below is a link to the section in question.

https://github.com/beave/sagan/blob/master/src/sagan-liblognorm.c#L92-L102

Let me know your thoughts.

— You are receiving this because you are subscribed to this thread. Reply to this email directly or view it on GitHub https://github.com/rsyslog/liblognorm/issues/206

beave commented 8 years ago

I'm trying to figure out a work around. Right now we have the following rule base files:

normalize: cisco, $RULE_PATH/cisco-normalize.rulebase normalize: openssh, $RULE_PATH/openssh-normalize.rulebase normalize: smtp, $RULE_PATH/smtp-normalize.rulebase normalize: dns, $RULE_PATH/dns-normalize.rulebase normalize: imap, $RULE_PATH/imap-normalize.rulebase normalize: su, $RULE_PATH/su-normalize.rulebase normalize: vmware, $RULE_PATH/vmware-normalize.rulebase normalize: linux-kernel, $RULE_PATH/linux-kernel-normalize.rulebase normalize: windows, $RULE_PATH/windows-normalize.rulebase normalize: snort, $RULE_PATH/snort-normalize.rulebase normalize: bro, $RULE_PATH/bro-normalize.rulebase normalize: nfcapd, $RULE_PATH/nfcapd-normalize.rulebase normalize: arp, $RULE_PATH/arp-normalize.rulebase normalize: citrix, $RULE_PATH/citrix-normalize.rulesbase normalize: fortinet, $RULE_PATH/fortinet-normalize.rulebase normalize: imperva, $RULE_PATH/imperva-normalize.rulebase normalize: procurve, $RULE_PATH/procurve-normalize.rulebase normalize: sonicwall, $RULE_PATH/sonicwall-normalize.rulebase

If a Sagan "rule" wants to normalize a "citrix" log, it can call (within the "rule")

normalize: citirx;

For example, here's a standard Sagan Citrix rule:

alert tcp $EXTERNAL_NET any -> $HOME_NET $HTTPS_PORT (msg: "[CITRIX-GEOIP] AAA LOGIN_FAILED from outside HOME_COUNTRY"; content: "AAA LOGIN_FAILED"; classtype: unsuccessful-user; parse_src_ip: 1; normalize: citrix; country_code: track by_src, isnot $HOME_COUNTRY; fwsam: src, 1 day; reference: url,support.citrix.com/article/CTX123875; reference: url,wiki.quadrantsec.com/bin/view/Main/5002280; sid:5002280; rev:1;)

(Note the "normalize: citrix;" in the middle of the rule)

This means different rules can have different "normalize" options.

I could take all these files and merge them into "one big file". I could then drop the "normalize: cisco;" in favor of just "normalize;" (or whatever). It's not a huge back end change but will break some backward compatibility.

I had originally done it this way so Sagan would load in only the normalization "rulebase" files it needed based on the Sagan rules enabled. The idea was that I would save a couple of "CPU ticks" by not having to deal with larger/unneeded rulebase files. The question is:

  1. Is the assumption accurate on saving "CPU ticks"?
  2. Do you guy see a better way than just loading one big "liblognorm" file ?
  3. Do you think loading multiple files might return in the future? If so, any ETA?

Thank you for your time.

davidelang commented 8 years ago

On Sun, 8 May 2016, Champ Clark wrote:

I could take all these files and merge them into "one big file". I could then drop the "normalize: cisco;" in favor of just "normalize;" (or whatever). It's not a huge back end change but will break some backward compatibility.

I had originally done it this way so Sagan would load in only the normalization "rulebase" files it needed based on the Sagan rules enabled. The idea was that I would save a couple of "CPU ticks" by not having to deal with larger/unneeded rulebase files. The question is:

  1. Is the assumption accurate on saving "CPU ticks"?
  2. Do you guy see a better way than just loading one big "liblognorm" file ?
  3. Do you think loading multiple files might return in the future? If so, any ETA?

you save cpu at initialization time, but little, if any cpu at normalize time.

With the rules being compiled into a parse tree, adding rules adds very little to the tests needed to process a single log entry.

Especially with something like cisco vs other. Since all cisco logs are %ASA-* there is at most one test added to the log processing to eliminate all of those rules.

now, that's the theory, I would suggest doing some benchmarks to convince yourself.

As for the backwards compatibility, my knee-jerk reaction is to allow normalize:* to exist in the file and just ignore everything after the :

This does open a small risk of incompatibility, if you had two rules in different rulesets that could match the same line, but that's not very likely.

David Lang

beave commented 8 years ago

I had a feeling that would be the answer on CPU utilization. I'll do some benchmarking and let you know if I see any difference.

For a programmatic standpoint, dealing with one file is a lot easier. :)

For compatibility, I'm going to do exactly that. If a rule has "normalize: cisco;", i'll just ignore the "cisco" section.

On last question. In the rulebase files, can you have multiple "prefix=" lines?

For example:

----

prefix= rule=: %uptime:word% %authfail:word% Authentication failure for SNMP req from host %src-ip:ipv4%

prefix=id=%firewall:word% sn=%serial:word% time="%date:word% %hour:number%:%minute:number%:%seconds:number%" fw=%fire-ip:ipv4% pri=%pri:number% c=%c:number% m=%m:number%

rule=: msg="Possible port scan detected" n=%n:number% src=%src-ip:ipv4%:%src-port:number%:%interface:word% dst=%dst-ip:ipv4%:%dst-port:number%:%interface:word% note=%ports-scanned:quoted-string%

----

This will be my last question for now. Thank you for the help! I'll close this out after this questions! :)

davidelang commented 8 years ago

On Sun, 8 May 2016, Champ Clark wrote:

I had a feeling that would be the answer on CPU utilization. I'll do some benchmarking and let you know if I see any difference.

For a programmatic standpoint, dealing with one file is a lot easier. :)

For compatibility, I'm going to do exactly that. If a rule has "normalize: cisco;", i'll just ignore the "cisco" section.

On last question. In the rulebase files, can you have multiple "prefix=" lines?

absolutely!!

I tend to have rules like

prefix=:%timestamp% %hostname% program[%pid%]: rule=: Authentication failure rule=: login by prefix=:%timestamp% %hostname% program2[%pid%]: rule=: Authentication failure rule=: login by

etc

(obviously greatly simplified)

David Lang

beave commented 8 years ago

perfect! thanks!