influxdata / go-syslog

Blazing fast syslog parser
MIT License
476 stars 69 forks source link

Allow non UTF8 characters in message #35

Closed bastjan closed 4 years ago

bastjan commented 4 years ago
MSG             = MSG-ANY / MSG-UTF8
MSG-ANY         = *OCTET ; not starting with BOM
MSG-UTF8        = BOM UTF-8-STRING

Usage

message, err := rfc5424.NewMachine(rfc5424.WithCompliantMsg()).Parse(subject)

Performance

Initialization of machine moved out of benchmark for all comparisons.

develop...allow-non-utf8

benchstat

``` name old time/op new time/op delta Parse/[no]_empty_input__________________________________-8 409ns ± 1% 408ns ± 1% ~ (p=0.416 n=10+10) Parse/[no]_multiple_syslog_messages_on_multiple_lines___-8 494ns ± 0% 509ns ± 1% +2.97% (p=0.000 n=10+10) Parse/[no]_impossible_timestamp_________________________-8 1.47µs ± 0% 1.48µs ± 1% +0.82% (p=0.000 n=10+9) Parse/[no]_malformed_structured_data____________________-8 681ns ± 0% 707ns ± 3% +3.90% (p=0.000 n=9+10) Parse/[no]_with_duplicated_structured_data_id___________-8 1.92µs ± 1% 1.99µs ± 1% +3.38% (p=0.000 n=10+9) Parse/[ok]_minimal______________________________________-8 254ns ± 0% 263ns ± 1% +3.43% (p=0.000 n=7+10) Parse/[ok]_average_message______________________________-8 2.67µs ± 1% 2.64µs ± 2% ~ (p=0.271 n=10+10) Parse/[ok]_complicated_message__________________________-8 2.12µs ± 0% 2.09µs ± 0% -1.60% (p=0.000 n=10+8) Parse/[ok]_very_long_message____________________________-8 5.16µs ± 1% 4.60µs ± 0% -10.89% (p=0.000 n=10+8) Parse/[ok]_all_max_length_and_complete__________________-8 5.14µs ± 1% 5.16µs ± 2% ~ (p=0.645 n=10+10) Parse/[ok]_all_max_length_except_structured_data_and_mes-8 3.73µs ± 0% 3.72µs ± 1% -0.33% (p=0.023 n=10+10) Parse/[ok]_minimal_with_message_containing_newline______-8 287ns ± 1% 283ns ± 0% -1.43% (p=0.000 n=10+7) Parse/[ok]_w/o_procid,_w/o_structured_data,_with_message-8 1.06µs ± 1% 1.02µs ± 1% -3.66% (p=0.000 n=8+10) Parse/[ok]_minimal_with_UTF-8_message___________________-8 524ns ± 1% 523ns ± 0% ~ (p=0.850 n=10+9) Parse/[ok]_minimal_with_UTF-8_message_starting_with_BOM_-8 532ns ± 1% 526ns ± 0% -1.27% (p=0.000 n=10+9) Parse/[ok]_with_structured_data_id,_w/o_structured_data_-8 1.08µs ± 0% 1.08µs ± 1% +0.88% (p=0.000 n=9+10) Parse/[ok]_with_multiple_structured_data________________-8 1.77µs ± 0% 1.71µs ± 1% -3.63% (p=0.000 n=10+10) Parse/[ok]_with_escaped_backslash_within_structured_data-8 1.61µs ± 1% 1.60µs ± 0% -1.03% (p=0.000 n=10+9) Parse/[ok]_with_UTF-8_structured_data_param_value,_with_-8 1.86µs ± 0% 1.79µs ± 0% -3.39% (p=0.000 n=10+10) ```

develop...allow-non-utf8.WithCompliantMsg

benchstat

``` name old time/op new time/op delta Parse/[no]_empty_input__________________________________-8 409ns ± 1% 411ns ± 1% ~ (p=0.124 n=10+10) Parse/[no]_multiple_syslog_messages_on_multiple_lines___-8 494ns ± 0% 511ns ± 0% +3.38% (p=0.000 n=10+10) Parse/[no]_impossible_timestamp_________________________-8 1.47µs ± 0% 1.49µs ± 1% +1.59% (p=0.000 n=10+10) Parse/[no]_malformed_structured_data____________________-8 681ns ± 0% 689ns ± 0% +1.19% (p=0.000 n=9+9) Parse/[no]_with_duplicated_structured_data_id___________-8 1.92µs ± 1% 1.91µs ± 0% -0.43% (p=0.005 n=10+10) Parse/[ok]_minimal______________________________________-8 254ns ± 0% 254ns ± 0% ~ (p=0.103 n=7+10) Parse/[ok]_average_message______________________________-8 2.67µs ± 1% 2.62µs ± 0% -1.72% (p=0.000 n=10+9) Parse/[ok]_complicated_message__________________________-8 2.12µs ± 0% 2.08µs ± 0% -1.76% (p=0.000 n=10+10) Parse/[ok]_very_long_message____________________________-8 5.16µs ± 1% 4.56µs ± 0% -11.66% (p=0.000 n=10+10) Parse/[ok]_all_max_length_and_complete__________________-8 5.14µs ± 1% 5.11µs ± 0% -0.42% (p=0.009 n=10+10) Parse/[ok]_all_max_length_except_structured_data_and_mes-8 3.73µs ± 0% 3.74µs ± 1% ~ (p=0.492 n=10+10) Parse/[ok]_minimal_with_message_containing_newline______-8 287ns ± 1% 283ns ± 0% -1.32% (p=0.000 n=10+10) Parse/[ok]_w/o_procid,_w/o_structured_data,_with_message-8 1.06µs ± 1% 1.07µs ± 1% +0.94% (p=0.001 n=8+9) Parse/[ok]_minimal_with_UTF-8_message___________________-8 524ns ± 1% 523ns ± 1% ~ (p=0.597 n=10+10) Parse/[ok]_minimal_with_UTF-8_message_starting_with_BOM_-8 532ns ± 1% 531ns ± 0% ~ (p=0.693 n=10+8) Parse/[ok]_with_structured_data_id,_w/o_structured_data_-8 1.08µs ± 0% 1.08µs ± 0% +0.81% (p=0.000 n=9+10) Parse/[ok]_with_multiple_structured_data________________-8 1.77µs ± 0% 1.71µs ± 0% -3.44% (p=0.000 n=10+10) Parse/[ok]_with_escaped_backslash_within_structured_data-8 1.61µs ± 1% 1.61µs ± 1% ~ (p=0.926 n=10+10) Parse/[ok]_with_UTF-8_structured_data_param_value,_with_-8 1.86µs ± 0% 1.80µs ± 1% -2.95% (p=0.000 n=10+9) ```

allow-non-utf8...allow-non-utf8.WithCompliantMsg

benchstat

``` name old time/op new time/op delta Parse/[no]_empty_input__________________________________-8 408ns ± 1% 411ns ± 1% +0.66% (p=0.022 n=10+10) Parse/[no]_multiple_syslog_messages_on_multiple_lines___-8 509ns ± 1% 511ns ± 0% +0.39% (p=0.017 n=10+10) Parse/[no]_impossible_timestamp_________________________-8 1.48µs ± 1% 1.49µs ± 1% +0.77% (p=0.006 n=9+10) Parse/[no]_malformed_structured_data____________________-8 707ns ± 3% 689ns ± 0% -2.61% (p=0.002 n=10+9) Parse/[no]_with_duplicated_structured_data_id___________-8 1.99µs ± 1% 1.91µs ± 0% -3.69% (p=0.000 n=9+10) Parse/[ok]_minimal______________________________________-8 263ns ± 1% 254ns ± 0% -3.43% (p=0.000 n=10+10) Parse/[ok]_average_message______________________________-8 2.64µs ± 2% 2.62µs ± 0% ~ (p=0.483 n=10+9) Parse/[ok]_complicated_message__________________________-8 2.09µs ± 0% 2.08µs ± 0% ~ (p=0.163 n=8+10) Parse/[ok]_very_long_message____________________________-8 4.60µs ± 0% 4.56µs ± 0% -0.87% (p=0.000 n=8+10) Parse/[ok]_all_max_length_and_complete__________________-8 5.16µs ± 2% 5.11µs ± 0% ~ (p=0.190 n=10+10) Parse/[ok]_all_max_length_except_structured_data_and_mes-8 3.72µs ± 1% 3.74µs ± 1% +0.50% (p=0.006 n=10+10) Parse/[ok]_minimal_with_message_containing_newline______-8 283ns ± 0% 283ns ± 0% ~ (p=0.353 n=7+10) Parse/[ok]_w/o_procid,_w/o_structured_data,_with_message-8 1.02µs ± 1% 1.07µs ± 1% +4.78% (p=0.000 n=10+9) Parse/[ok]_minimal_with_UTF-8_message___________________-8 523ns ± 0% 523ns ± 1% ~ (p=0.525 n=9+10) Parse/[ok]_minimal_with_UTF-8_message_starting_with_BOM_-8 526ns ± 0% 531ns ± 0% +1.11% (p=0.000 n=9+8) Parse/[ok]_with_structured_data_id,_w/o_structured_data_-8 1.08µs ± 1% 1.08µs ± 0% ~ (p=0.926 n=10+10) Parse/[ok]_with_multiple_structured_data________________-8 1.71µs ± 1% 1.71µs ± 0% ~ (p=0.181 n=10+10) Parse/[ok]_with_escaped_backslash_within_structured_data-8 1.60µs ± 0% 1.61µs ± 1% +1.08% (p=0.000 n=9+10) Parse/[ok]_with_UTF-8_structured_data_param_value,_with_-8 1.79µs ± 0% 1.80µs ± 1% +0.45% (p=0.000 n=10+9) ```

Fixes #21

bastjan commented 4 years ago

Thanks for the review and the refinements!

I think the name WithCompliantMsg reflects the intended option better and I'm all for it.

Since the default behavior (no option passed in) stays as is, if we want, we could also relax it more, which should lead to even better performances. In this case a lot of test cases should be updated, tho. :)

So the default (no option) would be relaxed to accept any message; WithCompliantMsg would make the parser more strict?

leodido commented 4 years ago

Yes, exactly @bastjan :)

bastjan commented 4 years ago

Sounds good to me! I could tackle updating the test cases over the next two days if you need any help.

leodido commented 4 years ago

I can proceed right now with the option rename.

But, I'd love if you could help me with the second and last step of this PR (relaxing the default behaviour and refactoring the test cases)

bastjan commented 4 years ago

But, I'd love if you could help me with the second and last step of this PR (relaxing the default behaviour and refactoring the test cases)

Sure, glad to help :)

bastjan commented 4 years ago

Hey @leodido,

I relaxed the default parsing, updated the tests (there are many 😅), and rerun the benchmarks in the PR description. Mind to take a look? :)

I also relaxed the builder to not create a situation where a parser->builder roundtrip would remove the MSG. Is this alright with you?

leodido commented 4 years ago

Hey @bastjan yes, relaxing the builder was needed.

Ideally, in a second PR, we should approach the same topic for builder.

Ie., making the builder configurable (compliant message or not) and its SetMessage to work accordingly. There are already (empty) builder test cases for such scenarios.

Anyways, you did a great job, thanks for approaching issue #21 .. I'm about to push some little final refinements and then merge this 🎉