thekrakken / java-grok

Simple API that allows you to easily parse logs and other files
http://grok.nflabs.com/
Other
358 stars 152 forks source link

GROK Multiline log parsing #102

Open daggumalli opened 6 years ago

daggumalli commented 6 years ago

I am trying to parse multiline logs using GROK.. but the result omitting new line. Example code below.

String log = "a|b|c|d"+"\n"+"e"; Pattern = (?m)(?<ErrMsg>.*)

Output is = ErrMsg = a|b|c|d

Any help would be appreicated!!!

ottobackwards commented 6 years ago

This is because the .* pattern without the DOTALL doesn't evaluate past the newline.

ottobackwards commented 6 years ago

I was able to get your sample to work with (?m)(?<ErrMsg>.*\\R.*)"

gruselglatz commented 5 years ago

I try to achieve to parse this log, with nifi which uses this lib. but i fail all the time: In kibana and Graylog its working fine, but not with nifi. Can i trick it into not stopping at the End of a line. The (?m) flag doesn't help strangely.

2019-09-24 08:52:46,881 [INFO ] 00000000 Dashboard loading performance: 
    beforeLoadingComponentsAndPrompts: 1042 ms
    dashboardComponentsCreated: 2082 ms
    dashboardInitialContentLinkingComplete: 2160 ms
    dashboardInitialRenderComplete: 3285 ms
    path: /public/Dashboards/PVP/PVP_Einstieg_MO/PVP_Einstieg_MO.stdb
    browser: Chrome 79
    httpRequestCount: 32
    hasInputFilters: false
    InputComponentCount: 1
    TableComponentCount: 1
    CDFComponentCount: 1
    PreselectComponentCount: 2
    ContentGenerator: 55 ms
    CreateChartConfig (F2): 21 ms
    CreateChartConfig (F1): 14 ms
    Get CDA Parameter from CDA file (F2): 2 ms
    UserAgent: Mozilla/5.0 (Windows NT 6.1; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/79.0.3920.0 Safari/537.36
 [https-openssl-nio-9085-exec-547] (ams.plugin.api.LoggingAPI) 
ottobackwards commented 5 years ago

No, Nifi literally reads line by line and passes each line to grok. If you are using Nifi what you could think of doing is using another processor to modify the content, like replacing "\n" with "|" or something, and then modifying your grok to account for the change. ReplaceText processes could do this

gruselglatz commented 5 years ago

tried pretty much everything, but the problem is that it's in the flowfile between normal logs. Do you know some magic to extract only this log from the others?

i tried it with nearly everything i found online and created this regex which should extract only this messages, but nifi handles it different and now I think i resign -.-

(^\d{4}-\d{2}-\d{2}\s\d{2}:\d{2}:\d{2},\d{3}\s\[.*\s{0,3}\]\s\d{8}\sDashboard\sloading\sperformance:.*(?:(?:\r\n|[\r\n])(?!\d{4}-\d{2}-\d{2}\s\d{2}:\d{2}:\d{2},\d{3}\s\[.*\s{0,3}\]\s\d{8}).*)*(?:\r\n|[\r\n])?)

Is there not a single option to enable multiline in nifi grok? or can i fork it and recompile a new processor with this option enabled? (I am no java dev :( )

ottobackwards commented 5 years ago

Again: https://github.com/apache/nifi/blob/master/nifi-nar-bundles/nifi-standard-services/nifi-record-serialization-services-bundle/nifi-record-serialization-services/src/main/java/org/apache/nifi/grok/GrokRecordReader.java#L85

It reads it line by line. There is no work around in the record reader.

You can create an issue in the nifi jira, attaching a sanitized sample file / data that can be used to test the parsing, and a flow template if you can.

I would think you’d be looking at a flow like

[source] —> flow file with multiple multi line things delimited by ??? ( empty line? ) -> SplitContent or something -> one flow file per entry -> ReplaceText get rid of new lines -> ???? with the grok record reader -> ???? -> Profit

On September 26, 2019 at 08:51:58, herbert (notifications@github.com) wrote:

tried pretty much everything, but the problem is that it's in the flowfile between normal logs. Do you know some magic to extract only this log from the others?

i tried it with nearly everything i found online and created this regex which should extract only this messages, but nifi handles it different and now I think i resign -.-

(^\d{4}-\d{2}-\d{2}\s\d{2}:\d{2}:\d{2},\d{3}\s[.\s{0,3}]\s\d{8}\sDashboard\sloading\sperformance:.(?:(?:\r\n|[\r\n])(?!\d{4}-\d{2}-\d{2}\s\d{2}:\d{2}:\d{2},\d{3}\s[.\s{0,3}]\s\d{8}).)*(?:\r\n|[\r\n])?)

Is there not a single option to enable multiline in nifi grok? or can i fork it and recompile a new processor with this option enabled? (I am no java dev :( )

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/thekrakken/java-grok/issues/102?email_source=notifications&email_token=AAIPL7ZNHPVFVSDVNM7EZ6DQLSV65A5CNFSM4FN2NOK2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOD7VOKVI#issuecomment-535487829, or mute the thread https://github.com/notifications/unsubscribe-auth/AAIPL752JJOQ3HW3HC3FYZDQLSV65ANCNFSM4FN2NOKQ .

ottobackwards commented 5 years ago

You can also try posting to the users@nifi.apache.org list

On September 26, 2019 at 10:21:21, Otto Fowler (ottobackwards@gmail.com) wrote:

Again: https://github.com/apache/nifi/blob/master/nifi-nar-bundles/nifi-standard-services/nifi-record-serialization-services-bundle/nifi-record-serialization-services/src/main/java/org/apache/nifi/grok/GrokRecordReader.java#L85

It reads it line by line. There is no work around in the record reader.

You can create an issue in the nifi jira, attaching a sanitized sample file / data that can be used to test the parsing, and a flow template if you can.

I would think you’d be looking at a flow like

[source] —> flow file with multiple multi line things delimited by ??? ( empty line? ) -> SplitContent or something -> one flow file per entry -> ReplaceText get rid of new lines -> ???? with the grok record reader -> ???? -> Profit

On September 26, 2019 at 08:51:58, herbert (notifications@github.com) wrote:

tried pretty much everything, but the problem is that it's in the flowfile between normal logs. Do you know some magic to extract only this log from the others?

i tried it with nearly everything i found online and created this regex which should extract only this messages, but nifi handles it different and now I think i resign -.-

(^\d{4}-\d{2}-\d{2}\s\d{2}:\d{2}:\d{2},\d{3}\s[.\s{0,3}]\s\d{8}\sDashboard\sloading\sperformance:.(?:(?:\r\n|[\r\n])(?!\d{4}-\d{2}-\d{2}\s\d{2}:\d{2}:\d{2},\d{3}\s[.\s{0,3}]\s\d{8}).)*(?:\r\n|[\r\n])?)

Is there not a single option to enable multiline in nifi grok? or can i fork it and recompile a new processor with this option enabled? (I am no java dev :( )

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/thekrakken/java-grok/issues/102?email_source=notifications&email_token=AAIPL7ZNHPVFVSDVNM7EZ6DQLSV65A5CNFSM4FN2NOK2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOD7VOKVI#issuecomment-535487829, or mute the thread https://github.com/notifications/unsubscribe-auth/AAIPL752JJOQ3HW3HC3FYZDQLSV65ANCNFSM4FN2NOKQ .