ptmcg / logmerger

TUI utility to view multiple log files with merged timeline
MIT License
160 stars 5 forks source link

Can't parse some datetime strings #43

Open RTnhN opened 1 day ago

RTnhN commented 1 day ago

With the following logs, I ran into a parser issue:

Traceback (most recent call last):
  File "C:\Users\Zachs\AppData\Local\Programs\Python\Python310\lib\runpy.py", line 196, in _run_module_as_main
    return _run_code(code, main_globals, None,
  File "C:\Users\Zachs\AppData\Local\Programs\Python\Python310\lib\runpy.py", line 86, in _run_code
    exec(code, run_globals)
  File "C:\Users\Zachs\AppData\Local\Programs\Python\Python310\Scripts\logmerger.exe\__main__.py", line 7, in <module>
  File "C:\Users\Zachs\repos\logmerger\logmerger\logmerger.py", line 413, in main
    app.run()
  File "C:\Users\Zachs\repos\logmerger\logmerger\logmerger.py", line 249, in run
    merged_lines_table.insert_many(merged_lines)
  File "C:\Users\Zachs\AppData\Local\Programs\Python\Python310\lib\site-packages\littletable.py", line 1933, in insert_many
    first = next(first_obj)
  File "C:\Users\Zachs\repos\logmerger\logmerger\logmerger.py", line 357, in _merge_log_file_lines
    for line_number, (timestamp, items) in enumerate(merger, start=1):
  File "C:\Users\Zachs\repos\logmerger\logmerger\merging.py", line 34, in __next__
    return next(self.heap_iter)
  File "C:\Users\Zachs\AppData\Local\Programs\Python\Python310\lib\heapq.py", line 373, in merge
    value = next()
  File "C:\Users\Zachs\repos\logmerger\logmerger\logmerger.py", line 178, in _inner
    yield from ((s, obj) for obj in seq)
  File "C:\Users\Zachs\repos\logmerger\logmerger\logmerger.py", line 178, in <genexpr>
    yield from ((s, obj) for obj in seq)
  File "C:\Users\Zachs\repos\logmerger\logmerger\multiline_log_handler.py", line 115, in __call__
    for timestamp, lines in WindowedSort(
  File "C:\Users\Zachs\repos\logmerger\logmerger\multiline_log_handler.py", line 18, in __init__
    temp = list(islice(self.seq_iter, window))
  File "C:\Users\Zachs\repos\logmerger\logmerger\multiline_log_handler.py", line 117, in <genexpr>
    seq=((a, list(b)) for a, b in groupby(log_seq, key=self._newlogline_detector)),
  File "C:\Users\Zachs\repos\logmerger\logmerger\timestamp_wrapper.py", line 149, in __call__
    ret = self.str_to_time(m[self.timestamp_match_group]), trimmed_obj
ValueError: Invalid isoformat string: '2024-08-25 18:35:55,882'

I think that the datetime.fromisoformat does not support comma for the millisecond section. For example:

datetime.fromisoformat("2024-08-24 12:12:12.111")
datetime.datetime(2024, 8, 24, 12, 12, 12, 111000)

but

datetime.fromisoformat("2024-08-24 12:12:12,111")
Traceback (most recent call last):
  File "<string>", line 1, in <module>
ValueError: Invalid isoformat string: '2024-08-24 12:12:12,111'

I think it might need to have a custom written datetime. Actually, it seems that every one of your formatters uses the same datetime.fromisoformat function for the strptime_format?

Here are the two log sections that I was trying to use:

test.log

2024-08-25 18:35:55,882 INFO:sage_driver.cc:223:(RunDriver)Beginning Driver process
2024-08-25 18:35:56,475 INFO:spi_wrapper.cc:64:(InitializeSpi)Sage driver began with IRQ not asserted
2024-08-25 18:36:09,404 INFO:driver_connections.cc:327:(ProcessHwCtrlMsg)Receive HwCtrl message for SwitchWT41U
2024-08-25 18:36:18,060 INFO:driver_connections.cc:319:(ProcessHwCtrlMsg)Receive HwCtrl message for SwitchBT121
2024-08-25 18:36:18,110 WARN:receive_buffer.c:206:(UpdateRxBuffer)received 424 headless data
2024-08-25 18:36:19,738 INFO:driver_connections.cc:319:(ProcessHwCtrlMsg)Receive HwCtrl message for SwitchBT121
2024-08-25 18:36:19,740 INFO:driver_connections.cc:319:(ProcessHwCtrlMsg)Receive HwCtrl message for SwitchBT121
2024-08-25 18:36:19,742 INFO:driver_connections.cc:319:(ProcessHwCtrlMsg)Receive HwCtrl message for SwitchBT121

test2.log

2024-08-25 18:36:06,624 INFO:default_dpm.py:277:(MainThread)Starting Data Processing Module
2024-08-25 18:36:06,767 INFO:default_dpm.py:209:(MainThread)Start new app process
2024-08-25 18:36:06,803 INFO:default_dpm.py:203:(MainThread)App 0301_0 enters to new state LOADING
2024-08-25 18:36:06,817 INFO:default_dpm.py:192:(MainThread)App 0301_0 is importing from RawData+1_0+030
2024-08-25 18:36:07,066 INFO:default_dpm.py:203:(MainThread)App 0301_0 enters to new state CONFIGURE
2024-08-25 18:36:23,399 INFO:process.py:108:(MainThread)Receive app control request. app_version_id: 0201_1; should_run: False
2024-08-25 18:36:23,414 INFO:default_dpm.py:209:(MainThread)Start new app process
2024-08-25 18:36:23,453 INFO:default_dpm.py:203:(MainThread)App 0201_1 enters to new state LOADING
2024-08-25 18:36:23,463 INFO:default_dpm.py:192:(MainThread)App 0201_1 is importing from AngleFeedbackTest+1_1+020
2024-08-25 18:36:23,515 INFO:default_dpm.py:57:(MainThread)App 0201_1 is selected and stopped
2024-08-25 18:36:23,517 INFO:default_dpm.py:203:(MainThread)App 0201_1 enters to new state CONFIGURE
2024-08-25 18:36:41,464 INFO:process.py:108:(MainThread)Receive app control request. app_version_id: ; should_run: False
2024-08-25 18:36:41,467 INFO:default_dpm.py:57:(MainThread)App 0201_1 is selected and stopped
2024-08-25 18:37:48,916 INFO:process.py:108:(MainThread)Receive app control request. app_version_id: ; should_run: False
RTnhN commented 1 day ago

Maybe it would be good to extend the tests to include logs with the formats that you support out of the box instead of just plain iso format?

RTnhN commented 1 day ago

It seems that maybe each instance of strptime_format could just be replaced by the string equivalency of it. For example, strptime_format = "%Y-%m-%d %H:%M:%S,%f" would be for the comma ms format that I use. The following code will take the string and convert it into a function that will do the actual conversion:

https://github.com/ptmcg/logmerger/blob/f93a5f415896cc91ceeec7a214e723fed4b5b34f/logmerger/timestamp_wrapper.py#L130-L133

I tested this with my logs, and it worked.

ptmcg commented 1 day ago

Yeah, I was just looking at that myself. :) It turns out that the fromisoformat method is tolerant of variations, but only as of Python 3.11. https://docs.python.org/3/library/datetime.html#datetime.datetime.fromisoformat

I need to finish some day job stuff this afternoon/evening, so it will need to wait a day or two. (Test submissions are always welcome PRs, btw.)

RTnhN commented 1 day ago

Ah, okay. I am on python 3.10 right now. My raspberry pis run python 3.11, and it works as expected there. It would be really nice to have reverse compatibility though.

Great! If I have some time, I will make some more test submissions.

Have you thought about implementing some CI/CD to build and test? For example, you can have it test the build on multiple versions of python: https://docs.github.com/en/actions/use-cases-and-examples/building-and-testing/building-and-testing-python

It looks like since you use tox for testing, you could just do this: https://docs.github.com/en/actions/use-cases-and-examples/building-and-testing/building-and-testing-python#running-tests-with-tox

ptmcg commented 1 day ago

I do this for some other open source projects of mine (pyparsing, and maybe littletable), this one just hasn't reached critical mass yet.

Yes, logmerger is supposed to be compatible back to Python 3.9, so I have toxconfigured to test those versions - if only there were some tests for those other formats...

ptmcg commented 1 day ago

Unit tests could be written without having to have lots of files just by directly testing the TimestampedLineTransformer.make_transformer_from_sample_line function to get a transformer, and then using that transformer to get an expected datetime.

RTnhN commented 1 day ago
  • if only there were some tests for those other formats...

:) I'll try to get a PR ready for other formats this weekend.

ptmcg commented 1 day ago

Please follow the pattern of other tests using pytest and pytest.mark.parametrize. And staying at the unit level will make the test more self-contained (won't require lots of extra supporting files).

I envision a unit test that takes a string of any of the supported formats and the expected resulting datetime value, calls TimetstampedLineTransformer.make_transformer_from_sample_line to get a transformer, and then passes the test string to it to get a computed datetime instance, which should match the given expected datetime.

Once we get past this step, the remainder of logmerger will work with the resulting datetimes, so that code is isolated from any particular timestamp format.

You might even give ChatGPT a try at generating such a test - I've found its not too terrible at writing test code.