microsoft / gctoolkit

Tool for parsing GC logs
https://github.com/microsoft/gctoolkit
MIT License
1.24k stars 153 forks source link

I find gctoolkit can't parse a CMF my gc log #193

Open karianna opened 2 years ago

karianna commented 2 years ago

I find gctoolkit can't parse my gc log and show that Missed: 2021-12-19T22:31:54.909+0800: 623.525: [Full GC (Allocation Failure) 2021-12-19T22:31:54.909+0800: 623.525: [CMS (concurrent mode failure): 2097152K->2097152K(2097152K), 9.4110998 secs] 3984639K->3984638K(3984640K), [Metaspace: 215219K->215219K(1241088K)], 9.4112098 secs] [Times: user=9.27 sys=0.15, real=9.41 secs] I think gctoolkit should add or modify a rule to match the line of gc log. By the way, I suggest we should build an issue to record the missing rule which used to strengthen gctoolkit.

Originally posted by @ZhangShushu123 in https://github.com/microsoft/gctoolkit/discussions/192

karianna commented 2 years ago

@ZhangShushu123 - Would you be willing to try and adjust the pattern? I'd love to see what we're missing in terms of helping contributors make changes like that :-)

ZhangShushu123 commented 2 years ago

Ok, I will follow this issue and try my best to contribute it :-)

ZhangShushu123 commented 2 years ago

I think the solution of this issue is changing the CMS_TENURED_BLOCK from DATE_TIMESTAMP + "\\[CMS: " + BEFORE_AFTER_CONFIGURED_PAUSE + "\\]" to DATE_TIMESTAMP + "\\[CMS(?:| \(concurrent mode failure\)): " + BEFORE_AFTER_CONFIGURED_PAUSE + "\\]"

karianna commented 2 years ago

@kirk-microsoft - WDYT?

kcpeppe commented 2 years ago

Ok, this is weird. I was following this thread and I didn’t see any new comments show up until now. This might work however, and unfortunately, without properly functional named capture groups, the parsers rely capture group indexes which is very fragile. Since this rule fragment is used in many different places, one needs to be careful as to not disturb the indexing. This is also an area that deserves better test coverage. That said, you’ll end up breaking a test some where when this happens. Second issue is in the interpretation of the data. This is a full GC inside a concurrent cycle. Is it in or out of a concurrent phase. It would appear to not be. Anyways, the Point is, we need to present the event so that any interested downstream aggregators will have a chance at understanding what has happened so they can capture and report on this issue.

karianna commented 2 years ago

Ok, this is weird. I was following this thread and I didn’t see any new comments show up until now. This might work however, and unfortunately, without properly functional named capture groups, the parsers rely capture group indexes which is very fragile. Since this rule fragment is used in many different places, one needs to be careful as to not disturb the indexing. This is also an area that deserves better test coverage. That said, you’ll end up breaking a test some where when this happens. Second issue is in the interpretation of the data. This is a full GC inside a concurrent cycle. Is it in or out of a concurrent phase. It would appear to not be. Anyways, the Point is, we need to present the event so that any interested downstream aggregators will have a chance at understanding what has happened so they can capture and report on this issue.

What's the proposed next step here do you think? We wait until we have named capture groups before fixing this one?

kcpeppe commented 2 years ago

I was trying to clarify the issues without being prescriptive. That said, I’m happy to eval a PR with this solution.

messi612 commented 2 years ago

I have the same problem and expect it to be solved