Closed liutgnu closed 1 month ago
I will close this MR because v2 been posted in [1]. Thanks!
Sounds good, thanks Tao.
[test the email reply to github]
On Tue, 18 Jun 2024 at 17:28, liutgnu @.***> wrote:
Closed #8 https://github.com/rhkdump/kdump-utils/pull/8.
— Reply to this email directly, view it on GitHub https://github.com/rhkdump/kdump-utils/pull/8#event-13198180795, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABOAKTLKX7NSVERCLDGWFHDZH74VBAVCNFSM6AAAAABH7JVWLGVHI2DSMVQWIX3LMV45UABCJFZXG5LFIV3GK3TUJZXXI2LGNFRWC5DJN5XDWMJTGE4TQMJYGA3TSNI . You are receiving this because your review was requested.Message ID: @.***>
Motivation
People usually won't test if kdump can really generate a vmcore before regarding kdump as workable, which as a result, a possibility of no vmcores generated after a real system crash. It is unexpected for kdump.
Thought it is highly recommented people to test kdump after any system modification, such as:
a. after kernel patching or whole yum update, as it might break something on which kdump is dependent, maybe due to introduction of any new bug etc. b. after any change at hardware level, maybe storage, networking, firmware upgrading etc. c. after implementing any new application, like which involves 3rd party modules etc.
Though these exceed the range of kdump, however a simple test notification is good to have for now.
Design
Kdump currently will check any relating files/fs/drivers modified before determine if initrd should rebuild when (re)start. A rebuild is an indicator of modification, so kdump need to be tested. This will clear the test status specified in $KDUMP_STATUS.
Kdump test check will happen at "kdumpctl (re)start/status", and will report the tested/untested status to users. A tested status indicates previously there was a vmcore successfully generated based on the current env, so it is more likely a vmcore will be generated later when real crash happens.
$KDUMP_STATUS is used for recording the newest vmcore and the test status. The format will be like:
root@1.2.3.4:/var/crash 127.0.0.1-2024-05-01-15:54:29/vmcore 1714550071 untested
Which means, the vmcore saved at this path, with this timestamp is the newest one, and the kdump is not tested. If later another vmcore in the same path been found, with larger(newer) timestamp. The newer vmcore will be updated into $KDUMP_STATUS, and the status will be marked as tested. (Note: There is a premise the newer vmcore is generated by the current machine. If not then the kdump test status is incorrect, see the following concurrent test case:
In order to differentiate vmcores and corresponding machine, ip address is not reliable, like ssh dump through a NAT network. Extra code will be used to implementing this feature. Besides personally I think concurrent kdump test on multi-machines is rare. So only serial kdump test is supported for now.)
The detailed updating/checking rules can be found in check_kdump_tested().