syslog-ng / syslog-ng

syslog-ng is an enhanced log daemon, supporting a wide range of input and output methods: syslog, unstructured text, queueing, SQL & NoSQL.
https://www.syslog-ng.com
Other
2.13k stars 472 forks source link

csv-parser outputs scrambled messages #1104

Open Cytrian opened 8 years ago

Cytrian commented 8 years ago

Dear sirs,

I've encountered a very strange problem in syslog-ng, at least versions 3.5.6 and 3.7.3. The csv-parser seems to scramble messages that exceed a certain length in a very bad way. Parts of the input are doubled, while other parts are missing.

For example:

The input line is 27-Jun-2016 12:51:15.213 INFO [externalRequestsExecutor-181] 01 abcdefghijklmnopqrstuvwxyz 02 abcdefghijklmnopqrstuvwxyz 03 abcdefghijklmnopqrstuvwxyz 04 abcdefghijklmnopqrstuvwxyz 05 abcdefghijklmnopqrstuvwxyz 06 abcdefghijklmnopqrstuvwxyz 07 abcdefghijklmnopqrstuvwxyz 08 abcdefghijklmnopqrstuvwxyz 09 abcdefghijklmnopqrstuvwxyz 10 abcdefghijklmnopqrstuvwxyz 11 abcdefghijklmnopqrstuvwxyz 12 abcdefghijklmnopqrstuvwxyz 13 abcdefOK

Then the output is, as expected:

Jun 29 11:05:30 55802918c8bb reqdebug: 01 abcdefghijklmnopqrstuvwxyz 02 abcdefghijklmnopqrstuvwxyz 03 abcdefghijklmnopqrstuvwxyz 04 abcdefghijklmnopqrstuvwxyz 05 abcdefghijklmnopqrstuvwxyz 06 abcdefghijklmnopqrstuvwxyz 07 abcdefghijklmnopqrstuvwxyz 08 abcdefghijklmnopqrstuvwxyz 09 abcdefghijklmnopqrstuvwxyz 10 abcdefghijklmnopqrstuvwxyz 11 abcdefghijklmnopqrstuvwxyz 12 abcdefghijklmnopqrstuvwxyz 13 abcdefOK All numbers are in order here.

If the input is one character longer:

27-Jun-2016 12:51:15.213 INFO [externalRequestsExecutor-181] 01 abcdefghijklmnopqrstuvwxyz 02 abcdefghijklmnopqrstuvwxyz 03 abcdefghijklmnopqrstuvwxyz 04 abcdefghijklmnopqrstuvwxyz 05 abcdefghijklmnopqrstuvwxyz 06 abcdefghijklmnopqrstuvwxyz 07 abcdefghijklmnopqrstuvwxyz 08 abcdefghijklmnopqrstuvwxyz 09 abcdefghijklmnopqrstuvwxyz 10 abcdefghijklmnopqrstuvwxyz 11 abcdefghijklmnopqrstuvwxyz 12 abcdefghijklmnopqrstuvwxyz 13 abcdefBAD the output is completely different:

Jun 29 11:14:53 55802918c8bb reqdebug: 01 abcdefghijklmnopqrstuvwxyz 02 abcdefghijklmnopqrstuvwxyz 03 abcdefghijklmnopqrstuvwxyz 04 abcdefghijklmnopqrstuvwxyz 05 abcdefghijklmnopqrstuvwxyz 06 abcdefghijklmnopqrstuvwxyz 07 abcdefghijklmnopqrstuvwxyz 08 abcdefghijklmnopqrstuvwxyz 09 abcdBADijklmnopqrstuvwxyz 12 abcdefghijklmnopqrstuvwxyz 13 abcdefBADijklmnopqrstuvwxyz 12 abcdefghijklmnopqrstuvwxyz 13 abcdefBAD

The 12 and 13 are doubled, also the word "BAD". 10 and 11 are missing.

My stripped down configuration is this:

@version: 3.7

options { chain_hostnames(off); flush_lines(0); use_dns(no); use_fqdn(no);
          owner("root"); group("adm"); perm(0640); stats_freq(0);
          bad_hostname("^gconfd$");
};

source s_reqdebug { file("/var/log/input.log" follow_freq(1) flags(validate-utf8,no-parse) ); };
filter f_reqdebug { message("^[0-9]{2}\.[a-zA-Z]{3}\.[0-9]{4} [0-9]{2}:[0-9]{2}:[0-9]{2}[.,][0-9]{3}"); };

parser p_reqdebug {
        csv-parser(columns("DT_DAY", "MONTH_ABBREV", "DT_YEAR", "TM_HOUR", "TM_MIN", "TM_SEC", "TM_MS", "LVL", "THREAD", "THREAD_2", "MESSAGE")
                flags(greedy)
                delimiters(" :.,-")
        );
};

log { source(s_reqdebug); filter(f_reqdebug);  parser(p_reqdebug); destination(d_syslog); flags(final); };

destination d_syslog { file("/var/log/output.log" owner("root") group("root") perm(0644)); };

The config reads from /var/log/input.log, parses things, and outputs to /var/log/output.log.

I've attached all interesting stuff as a archive. csv-parser-bug.zip

Note: The length at which this errors shows up, is not constant! In the example I've attached the error occurres when changing the length of the message from 399 bytes to 400 bytes.

In the example shown above the length differs from 433 to 434 bytes. So there seems to be some random property involved here.

Our environment: Debian 8 on Intel, with official debian syslog-ng packages (3.5.6) and also with the unofficial, newer packages (version 3.7.3)

I hope you can help us.

bazsi commented 8 years ago

Thanks for the report I am trying to check this out in the coming days. On Jun 29, 2016 4:55 AM, "Cytrian" notifications@github.com wrote:

Dear sirs,

I've encountered a very strange problem in syslog-ng, at least versions 3.5.6 and 3.7.3. The csv-parser seems to scramble messages that exceed a certain length in a very bad way. Parts of the input are doubled, while other parts are missing.

For example:

The input line is

27-Jun-2016 12:51:15.213 INFO [externalRequestsExecutor-181] 01 abcdefghijklmnopqrstuvwxyz 02 abcdefghijklmnopqrstuvwxyz 03 abcdefghijklmnopqrstuvwxyz 04 abcdefghijklmnopqrstuvwxyz 05 abcdefghijklmnopqrstuvwxyz 06 abcdefghijklmnopqrstuvwxyz 07 abcdefghijklmnopqrstuvwxyz 08 abcdefghijklmnopqrstuvwxyz 09 abcdefghijklmnopqrstuvwxyz 10 abcdefghijklmnopqrstuvwxyz 11 abcdefghijklmnopqrstuvwxyz 12 abcdefghijklmnopqrstuvwxyz 13 abcdefOK

Then the output is, as expected:

Jun 29 11:05:30 55802918c8bb reqdebug: 01 abcdefghijklmnopqrstuvwxyz 02 abcdefghijklmnopqrstuvwxyz 03 abcdefghijklmnopqrstuvwxyz 04 abcdefghijklmnopqrstuvwxyz 05 abcdefghijklmnopqrstuvwxyz 06 abcdefghijklmnopqrstuvwxyz 07 abcdefghijklmnopqrstuvwxyz 08 abcdefghijklmnopqrstuvwxyz 09 abcdefghijklmnopqrstuvwxyz 10 abcdefghijklmnopqrstuvwxyz 11 abcdefghijklmnopqrstuvwxyz 12 abcdefghijklmnopqrstuvwxyz 13 abcdefOK

All numbers are in order here.

If the input is one character longer:

27-Jun-2016 12:51:15.213 INFO [externalRequestsExecutor-181] 01 abcdefghijklmnopqrstuvwxyz 02 abcdefghijklmnopqrstuvwxyz 03 abcdefghijklmnopqrstuvwxyz 04 abcdefghijklmnopqrstuvwxyz 05 abcdefghijklmnopqrstuvwxyz 06 abcdefghijklmnopqrstuvwxyz 07 abcdefghijklmnopqrstuvwxyz 08 abcdefghijklmnopqrstuvwxyz 09 abcdefghijklmnopqrstuvwxyz 10 abcdefghijklmnopqrstuvwxyz 11 abcdefghijklmnopqrstuvwxyz 12 abcdefghijklmnopqrstuvwxyz 13 abcdefBAD

the output is completely different:

Jun 29 11:14:53 55802918c8bb reqdebug: 01 abcdefghijklmnopqrstuvwxyz 02 abcdefghijklmnopqrstuvwxyz 03 abcdefghijklmnopqrstuvwxyz 04 abcdefghijklmnopqrstuvwxyz 05 abcdefghijklmnopqrstuvwxyz 06 abcdefghijklmnopqrstuvwxyz 07 abcdefghijklmnopqrstuvwxyz 08 abcdefghijklmnopqrstuvwxyz 09 abcdBADijklmnopqrstuvwxyz 12 abcdefghijklmnopqrstuvwxyz 13 abcdefBADijklmnopqrstuvwxyz 12 abcdefghijklmnopqrstuvwxyz 13 abcdefBAD

The 12 and 13 are doubled, also the word "BAD". 10 and 11 are missing.

My stripped down configuration is this:

@version: 3.7

options { chain_hostnames(off); flush_lines(0); use_dns(no); use_fqdn(no); owner("root"); group("adm"); perm(0640); stats_freq(0); bad_hostname("^gconfd$"); };

source s_reqdebug { file("/var/log/input.log" follow_freq(1) flags(validate-utf8,no-parse) ); }; filter f_reqdebug { message("^[0-9]{2}.[a-zA-Z]{3}.[0-9]{4} [0-9]{2}:[0-9]{2}:[0-9]{2}[.,][0-9]{3}"); };

parser p_reqdebug { csv-parser(columns("DT_DAY", "MONTH_ABBREV", "DT_YEAR", "TM_HOUR", "TM_MIN", "TM_SEC", "TM_MS", "LVL", "THREAD", "THREAD_2", "MESSAGE") flags(greedy) delimiters(" :.,-") ); };

log { source(s_reqdebug); filter(f_reqdebug); parser(p_reqdebug); destination(d_syslog); flags(final); };

destination d_syslog { file("/var/log/output.log" owner("root") group("root") perm(0644)); };

The config reads from /var/log/input.log, parses things, and outputs to /var/log/output.log.

I've attached all interesting stuff as a archive. csv-parser-bug.zip https://github.com/balabit/syslog-ng/files/339161/csv-parser-bug.zip

Note: The length at which this errors shows up, is not constant! In the example I've attached the error occurres when changing the length of the message from 399 bytes to 400 bytes.

In the example shown above the length differs from 433 to 434 bytes. So there seems to be some random property involved here.

Our environment: Debian 8 on Intel, with official debian syslog-ng packages (3.5.6) and also with the unofficial, newer packages (version 3.7.3)

I hope you can help us.

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/balabit/syslog-ng/issues/1104, or mute the thread https://github.com/notifications/unsubscribe/AArldhe5zOiB8qQsMLl55BuOyB-CI4qWks5qQl0bgaJpZM4JBDYN .

Cytrian commented 8 years ago

Were you able to reproduce this behaviour? I find it quite frightening, because it could lead (in our case) to undetected problems.