ptmcg / logmerger

TUI utility to view multiple log files with merged timeline
MIT License
145 stars 3 forks source link

TODO: support user-specified timestamp formats #6

Closed ptmcg closed 10 months ago

ptmcg commented 10 months ago

From online discussion (https://hachyderm.io/@aburka/111045034433964748):

Need to extract the timestamp from these log lines:

\x1b[38;5;117m[1694412570.262003500|INFO](process_name) \x1b[0mSomething interesting happened!
\x1b[1;31m[1694412660.108488351|ERROR](process_name) \x1b[0mSomething bad happened!
ptmcg commented 10 months ago

This is a little complicated if the timestamp value is not at the start of the log string (since I remove the timestamp from the log message output to save screen real estate - the timestamp is already there in its own column, we don't need to show it again in the log message.

ptmcg commented 10 months ago

Pretty sure I have this worked out now - the format I would look for here would be '(.*m\[)((...)\|)'.

This structure allows me to tease apart the pieces to keep and the pieces to remove. The leading group matches the escape sequence and the "[" before the timestamp. The second group shows the parts that will be removed from the log message before displaying it (in your example, the timestamp and the delimiting "|" before the log level). The '(...)' group embedded in the second group is a placeholder that I use to plug in various timestamp formats, and will be used to actually extract and parse the timestamp value. So your format is really just a template around the timestamp value itself, showing parts before the timestamp to retain, and parts after the timestamp to remove (along with the timestamp itself).

If I "merge" just your sample lines, I get:

  Line   Timestamp                 .\Working\Log With Esc Sequences.Txt
 ────────────────────────────────────────────────────────────────────────────────────────────────────
   1     2023-09-11 01:09:30.262   [INFO](process_name) Something interesting happened!
   2     2023-09-11 01:11:00.108   [ERROR](process_name) Something bad happened!

The colorizing escape sequences are retained for now, but they make the tabular and interactive formats lose their columnar alignment - since they all just count characters, and do not detect that the colorizing escape sequence characters do not actually take up terminal space. We don't see it in this example because there is nothing to the right of these lines, but when I do a display with another log file, the columns get messed up.

The only way to resolve this now is to either strip the colorizing sequences, or replace them with rich-compatible tags.

Otherwise, this feature is very close to completion.

durka commented 10 months ago

I had to reread the regex a few times but your explanation of it makes sense!

For the escapes I think you could either strip them (might have to since in TUI mode you are going to apply your own colors?) or make the character counting smarter to know that escapes are zero-width?

ptmcg commented 10 months ago

The character counting is out of my control, it is done by the textual package. I have a sequence stripper working now, and the first cut at adding rich tags ran into some other issues. Maybe open a new ticket here to preserve ANSI colorizing escapes.

With stripped escape sequences, this will be good to go in 0.3.0 (which will be a banger!)

ptmcg commented 10 months ago

I've committed this code, if you wouldn't mind trying it out to see how it looks for you, before I make a 0.3.0 release and then find I didn't get things quite right.

durka commented 10 months ago

Yes, it's working with the pattern you gave for --timestamp_format!

ptmcg commented 10 months ago

Released in 0.3.0