zarunbal / LogExpert

Windows tail program and log file analyzer.
Other
956 stars 163 forks source link

Question About Multi Line Custom Columnizer #64

Open Rowton opened 6 years ago

Rowton commented 6 years ago

LogExpert version 1.6.13 on Windows 7

I am currently trying to write a custom columnizer. While it is basically working I do have one issue and I do not know if I'm misunderstanding the LogExpert functionality, or I have made a mistake in the code.

I have a log file that contains multi-line entries (e.g. stack traces associated with a single log file entry), but the entries do have start and end tags so I can reliably identify a single entry, even when it spans multiple lines. Using the bundled Glassfish columnizer as an example, I used the ILogLineXmlColumnizer interface with a class implementing the IXmlLogConfiguration interface setting the XmlStartTag & XmlEndTag appropriately and the Stylesheet to null (As an aside this was quite confusing that you can use ILogLineXmlColumnizer with non-XML data), thus I thought that the ILogLine line parameter passed to the SplitLine method which represents a single log file entry would contain new line characters, I would then process as normal.

I appears I was wrong or I have made a mistake in my code, as each of the ILogLine lines passed to the SplitLine method seems to be a single text file line, still split by each and every new line character. Thus each line of the stack trace (all associated with a single log file entry) is treated as a separate log file entry by LogExpert.

While I can identify these lines and locate them into the correct column so the data displayed in LogExpert initially appears correct, functionality like the highlighting and filtering is severely limited, for example if I highlight or filter based on say a log level keyword I would expect the entire multi line log entry to be highlighted or displayed in the filter window (not just the first line which has the log level on it).

Hopefully this makes sense? So my question is, is LogExpert working as expected or is the a bug somewhere, quite possibly in my custom columnizer code?

In an attempt to try to answer my own question I did go digging in the LogExpert source code and did spot some things which may be relevant, though please be aware it was a VERY brief look and I might not understand things correctly or how the code fits into the big picture. In the class "XmlBlockSplitter" there is a method "ParseXmlBlock" and this is where it appears to decide if your data is XML or generic multi line text:

private void ParseXmlBlock(string block) { if (this.stylesheet != null) { XmlReader xmlReader = XmlReader.Create(new StringReader(block), settings, context);

    xmlReader.Read();
    xmlReader.MoveToContent();
    //xmlReader.MoveToContent();
    StringWriter textWriter = new StringWriter();

    this.xslt.Transform(xmlReader, null, textWriter);
    string message = textWriter.ToString();
    SplitToLinesList(message);
}
else
{
    SplitToLinesList(block);
    //this.lineList.Add(block);   // TODO: make configurable, if block has to be splitted
}

}

Note the TODO comment. At the moment the text block is always passed to SplitToLinesList where it is split by the new line strings "\r\n", "\n", or "\r".

I would welcome any comments on my analysis.

Kind Regards,

Rowton

zarunbal commented 6 years ago

So as far as I understand you have

Took a quick look into the code of the XML parts. I think that there a few challenges which needs some more time to look into them. Your data is multiline and not XML. The Glassfish Columnizer works because it logs only on one line. The Log4J Columnizer uses xslt tranformation to change the XML into one line with special character seperated columns (which will then be later splitted by the columnizers into the columns). The next challenge would be to display the Stacktrace multiline. As far as I digged into the code the current implementation is only for one line.

Would you share your implementation and some example how your log entries would look like? Maybe we could together work out something. My current idea would be to extend the implementation with some more extension points.

Just as sidenote that my free time schedule is currently tight and responses might be delayed ;)

I hope that I could help you a bit with my comments.

Rowton commented 6 years ago

Thank you for taking the time to look at my question, it's very much appreciated, your 3 points of understanding were correct.

My columnizer is for a specific Glassfish application, the log file follows the specification found here:

https://docs.oracle.com/cd/E18930_01/html/821-2416/abluk.html#gkmex

The requirement for my custom columnizer is to further split up the "Key Value Pairs" data into structured columns, these are specific to our Glassfish application, ending up with a 16 column LogExpert column view.

In principle I am more than happy to share my code, but at the moment I do not feel I can share the project as a whole "as is" due to it containing company specific information, I will look at reworking my code to create a "generic" columnizer to demonstrate my issue/attempted solution, basically this would mean stripping out the code that splits the "Key Value Pairs" column, leaving a pure 6 column Glassfish Columnizer, I think this will be fine for our purposes as it is the final "Message" column that contains the multi-line data. If we can work together to implement a solution I would be more than happy to contribute a new multi line Glassfish columnizer back to the project (From looking at the current Glassfish columnizer code it just splits into two columns, timestamp, and everything else for the purpose of time shifting).

While I will get the code posted here, hopefully fairly soon, I'll just let you know about my other investigations I undertook after I submitted my initial post but before you replied.

I investigated further and came up with a couple of (ultimately failed) ideas to achieve my goal.

First off I looked at implementing the IPreProcessColumnizer interface to possibly manipulate the data (I've used this before to strip out blank lines), what I was hoping for was the line data that the method PreProcessLine provided you would of been after the extraction by start and end tags, but before the method ParseXmlBlock was run, hence it would still contain the multiple lines. I was then thinking of crudely replacing the newlines with a different character, say a tab, and then in the main SplitLine method, converting tabs back to newlines (from your reply, this appears to be how the Log4J columnizer works). When I tried this is appears the line data provided to the method PreProcessLine is after the splitting by newline, thus this idea does not work.

Secondly, as I can reliably identify if a line is the first line of a new event, or a second or subsequent line of a multi line event. I looked at using the callback parameter of the SplitLine method to get the "next" line, see if it was part of the current message and if so append it, then repeat until the "next" line was actually a new event. This combined with the SplitLine method ignoring any line that is not a new event meant that I could get the whole event onto a single line, but now my log expert view was full of blank blocks as you can't actually discard lines once you get to the SplitLine method.

To be honest I didn't really like either of the above attempted solutions as the first has you actually editing the log data, and the second ends up with you having to parse each line multiple times from within the SplitLine method, and as mentioned in the Wiki you want to make the execution of this as fast as possible.

With regard to displaying the multi line data, with my second attempt above I did get log events with a multiple line message to display in the LogExpert grid view, LogExpert displayed the data as a single line, but if you copied the data and pasted into notepad the newlines were preserved and the data was displayed correctly. I think this could be separated into two deliverables, one to process multi line data and two the display. While it would be great to get the proper display of multi line data, just having the multi line data in a single log event line would be really beneficial on its own. As now filtering and highlighting would work correctly and you can always cut and paste the data out to get it displayed in the original format.

No worries about speed of responses, I fully understand this is a project in your free time.

pcollardez commented 2 years ago

As I see it, the problem is that a single line does not always represent a logging event.

Maybe adding an event ID into ILogLine or IColumnizedLogLine, and letting the ILogLineColumnizer defining it would be enough.

For really special cases, maybe having a way to read some previous lines (even columnized) would be nice.

The event ID could be auto-incremented by a default implementation.

Then the event ID could be exposed to the highlighting part (apply highlighting to the line or to the event) and the clipboard part (copy the line or the event).