Open cordlandwehr opened 8 months ago
Sorry for late response. I tried similar in #32375, but I cannot reproduce the issue with the current git HEAD. I cannot find any significant difference in sd-journal between e5191faf44bfcb709b1452890ef3fec9a7428055 and the current git HEAD (5c81de98fcb533c0889ed6c6f6cd8640bb626360).
Could you try to add reproducible test case in our test suite??
Thank you for testing! I will check on my side within this week and update the status (currently I am on travel)
@yuwata I did a quick check and still see the problem with my attached test program. I will need to take a deeper look into reproducing this issue in your test suite, since I am not sure yet what is triggering it.
Unfortunately I am not much closer to a boiled down test case in the systemd-testsuite, but I am beginning to start better understanding what is special about my test data file that creates the problem for me:
I am not sure how much deeper I can get into this, because this handles internal journal file information that I am not too familiar with. But still it seems to be wrong that after locating a cursor the follow up next() operation does not advance to the cursor's entry. @yuwata is there more I can do here to help in analyzing this behavior? for a test case it seems that there would a need to extend the test suite to have a multi file test setup for different boots.
Hm? Nitpick, but you lost 1
beginning of the time stamp in the above comment?
615,649,937,346,197
1,615,649,817,608,252
So, the realtime timestamp of the first entry of the targeting boot is older than the one of the last(?) entry of the previous boot?
The system that generates the test journal file doesn't have RTC battery??
I obtained those journals on an Raspberry Pi 4 and to my knowledge there is no RTC (it also has not internet connection and so also has no NTP time sync in place).
Hm? Nitpick, but you lost
1
beginning of the time stamp in the above comment?615,649,937,346,197 1,615,649,817,608,252
right, I missed the 1 in the beginning! correct values are:
1615649937346197
1615649817608252
You can find the full logs in my attached testcase. I opened it with journalctl -o json -D .
and looked at the last log from the first boot and the first log from the second boot (that one with the stated cursor) and outputs are in the exact following sequence:
{"__SEQNUM":"1326","_HOSTNAME":"raspberrypi4","_UID":"0","PRIORITY":"6","_BOOT_ID":"68f2e61d061247d8a8ba0b8d53a97a52","SYSLOG_FACILITY":"3","__REALTIME_TIMESTAMP":"1615649817608252","SYSLOG_IDENTIFIER":"systemd-journald","_COMM":"systemd-journal","__CURSOR":"s=c485fef5d17c4272a4a539c4e4708f9e;i=52e;b=68f2e61d061247d8a8ba0b8d53a97a52;m=312db57e;t=5bd6cc975c03c;x=8ea609d020e72445","_MACHINE_ID":"83fc99b40aab448f8004215d83cb3f66","_PID":"682","_CAP_EFFECTIVE":"25402800cf","__MONOTONIC_TIMESTAMP":"825079166","_SYSTEMD_CGROUP":"/system.slice/systemd-journald.service","_CMDLINE":"/lib/systemd/systemd-journald","MESSAGE":"Journal stopped","MESSAGE_ID":"d93fb3c9c24d451a97cea615ce59c00b","_GID":"0","_SYSTEMD_UNIT":"systemd-journald.service","_EXE":"/lib/systemd/systemd-journald","_TRANSPORT":"driver","__SEQNUM_ID":"c485fef5d17c4272a4a539c4e4708f9e","_SYSTEMD_SLICE":"system.slice","_SYSTEMD_INVOCATION_ID":"1d760163583e471baae2baa2162af062"}
{"_GID":"994","_SYSTEMD_CGROUP":"/system.slice/systemd-timesyncd.service","__SEQNUM":"401","__MONOTONIC_TIMESTAMP":"4074953","_CMDLINE":"/lib/systemd/systemd-timesyncd","SYSLOG_IDENTIFIER":"systemd-timesyncd","__REALTIME_TIMESTAMP":"1615649817265073","_SYSTEMD_SLICE":"system.slice","_BOOT_ID":"27acae2fe35a40ac93f9c7732c0b8e59","__SEQNUM_ID":"df3342d6d57b442da21c78027d3991f8","_TRANSPORT":"journal","PRIORITY":"6","_UID":"995","_HOSTNAME":"raspberrypi4","_EXE":"/lib/systemd/systemd-timesyncd","_SOURCE_REALTIME_TIMESTAMP":"1581090654362242","_PID":"188","__CURSOR":"s=df3342d6d57b442da21c78027d3991f8;i=191;b=27acae2fe35a40ac93f9c7732c0b8e59;m=3e2dc9;t=5bd6cc97083b1;x=d775661eeb273df0","CODE_FILE":"src/timesync/timesyncd.c","_MACHINE_ID":"83fc99b40aab448f8004215d83cb3f66","_SYSTEMD_INVOCATION_ID":"79f8b70fce494cb5a196973e561f2888","_COMM":"systemd-timesyn","CODE_LINE":"81","_SYSTEMD_UNIT":"systemd-timesyncd.service","MESSAGE":"System clock time unset or jumped backwards, restoring from recorded timestamp: Sat 2021-03-13 15:36:57 UTC","SYSLOG_FACILITY":"9","CODE_FUNC":"load_clock_timestamp","_CAP_EFFECTIVE":"2000000"}
So yes, the realtime timestamp of the first log entry from second boot is before the last realtime of the first boot.
Actually, we have not tested such case. I will take a look.
BTW, after 3c85d3fda50f71755aa276cfa60807c315bfc04b, the issue should be mitigated.
Before the commit, always new journal file is created on boot. After the commit, entries from a new boot will be appended to the existing online journal file.
So, in most situations compare_with_location()
and compare_locations()
compares location based on the seqnum ID.
@cordlandwehr You said you tried the current git HEAD. The journal files you tested are generated with an older release of systemd? And you read the journal files by the reproducer with newer libsystemd.so? Could you try to generate journal files with the current git HEAD? That is, please also update systemd-journald.
Yes, that is the exact setup. Rewording it in my words and giving more context about the use case:
I will have a look if I still see issues with an updated systemd on the embedded device where the logs are being generated. Yet considering already existing logs that are generated with older systemd versiond, it would be great to also have a solution for them.
I have no good idea about how to handle journal files generated with broken journald (v254 and v255).
And, as already commented, the issue should be mostly solved by 3c85d3fda50f71755aa276cfa60807c315bfc04b. As the first step, let's backport the commit to v254 and v255.
systemd version the issue has been seen with
255 (tested also on git main: e5191faf44bfcb709b1452890ef3fec9a7428055)
Used distribution
Ubuntu 22.04
Linux kernel version used
No response
CPU architectures issue was seen on
x86_64
Component
systemd-journald
Expected behaviour you didn't see
Please see the attached small test program, only consisting of a main.cpp and an example journal archive (path to extracted journal needs to be adapted!)
This is a simplified version of a test-case I noticed on our CI with systemd 255. It essentially does the following:
I see that with systemd < 253 this program nicely exists without errors; with latest git main (as well as release 255) it fails. I did a git bisect to isolate the behavior change and can say that exactly commit 262299dccbb09f36c8c830dabd6a104469d9852b is the first that makes this program fail.
journald_cursor_issue.tar.gz
Unexpected behaviour you saw
For a cursor from which I know that it is present in an archive, it does not become current after seeking it.
Steps to reproduce the problem
See the attached test program (from error description area)
Additional program output to the terminal or log subsystem illustrating the issue