Closed liutgnu closed 2 weeks ago
Hi Tao,
Thanks for the effort for making this achievement. I have one tiny concern about this PR.
If vmcore was created successfully and the status is recorded. Then later kdump kernel didn't finish booting to the init process or run the kdump scripts. Will we still get a 'success' status for kdump?
Hi Baoquan,
Yes, it will get a 'success' status. Because there was a vmcore generated before, so there is a success status recorded. Since we don't make any change to the system, the status file will not be deleted, and we are expecting the same env will lead to a same result, aka a vmcore generated for a later crash. However when a later crash happens, the init doesn't bootup. This breaks our premise of "A 'success' status indicates previously there has been a vmcore successfully generated based on the current env, so it is more likely a vmcore will be generated later when real crash happens".
This is a problem and it cannot be solved by this patch, the patch is based on the premise, but the premise is failed.
Personally I suggest we give more time of testing the premise. If it cannot cover the majority proportion of our cases, there is no need to introduce this patch.
Hi Dave,
Thanks for the review.
[root@localhost ~]# kdumpctl restart kdump: kexec: unloaded kdump kernel kdump: Stopping kdump: [OK] kdump: kexec: loaded kdump kernel kdump: Starting kdump: [OK] kdump: Fail: No vmcore creation test performed!
Above "Fail" message causes misunderstandings, people may think that kdump service failed. As it is just a notification, not mandatory remove the "Fail:" and "Success:" looks reasonable. And also extra brief words to let people know how to do the test will be better. For example: kdump: it seems kdump not tested, you can trigger crash and test kdump via "echo c > /proc/sysrq-trigger" . Maybe see if Philipp has better suggestion about the wording.
I updated the notify message as:
[root@localhost ~]# kdumpctl restart
kdump: kexec: unloaded kdump kernel
kdump: Stopping kdump: [OK]
kdump: kexec: loaded kdump kernel
kdump: Starting kdump: [OK]
kdump: Vmcore creation notice: No vmcore creation test performed!
The Fail/Success message been changed to "Vmcore creation notice". Personally I think the "kdump: it seems kdump not tested, you can trigger crash and test kdump via "echo c > /proc/sysrq-trigger"" message is redundant. Because the message is only meaningful to the CEE people, so CEE people know how to handle the test IMHO. And we don't want to instruct ordinary people to follow the message, because crash & reboot the system is a dangerous operation if people have no context about this feature. So I guess one "No vmcore creation test performed!" message is enough. What do you think?
Thanks, Tao Liu
Please checkout the new version of v3 in 1. And I will close this one. Thanks!
Hi Tao,
... Yes, it will get a 'success' status. Because there was a vmcore generated before, so there is a success status recorded. Since we don't make any change to the system, the status file will not be deleted, and we are expecting the same env will lead to a same result, aka a vmcore generated for a later crash. However when a later crash happens, the init doesn't bootup. This breaks our premise of "A 'success' status indicates previously there has been a vmcore successfully generated based on the current env, so it is more likely a vmcore will be generated later when real crash happens".
When I talked to Dave about this, an idea comes up that we can use a specific command to trigger the test, e.g kdumpctl test, then remove the file $VMCORE_CREATION_STATUS before triggering crash. Then we will only have 'success' or 'unknown'/'untested' two status. 'success' means vmcore is saved successfully last time, 'untested/unknown' means it's not tested or vmcore is not saved successfully. Not sure if this is clearer regarding to the status.
This is a problem and it cannot be solved by this patch, the patch is based on the premise, but the premise is failed.
Personally I suggest we give more time of testing the premise. If it cannot cover the majority proportion of our cases, there is no need to introduce this patch.
Hi Baoquan,
Hi Tao,
... Yes, it will get a 'success' status. Because there was a vmcore generated before, so there is a success status recorded. Since we don't make any change to the system, the status file will not be deleted, and we are expecting the same env will lead to a same result, aka a vmcore generated for a later crash. However when a later crash happens, the init doesn't bootup. This breaks our premise of "A 'success' status indicates previously there has been a vmcore successfully generated based on the current env, so it is more likely a vmcore will be generated later when real crash happens".
When I talked to Dave about this, an idea comes up that we can use a specific command to trigger the test, e.g kdumpctl test, then remove the file $VMCORE_CREATION_STATUS before triggering crash. Then we will only have 'success' or 'unknown'/'untested' two status. 'success' means vmcore is saved successfully last time, 'untested/unknown' means it's not tested or vmcore is not saved successfully. Not sure if this is clearer regarding to the status.
Thanks for the comments, now I have worked out the v4 which implemented the testing entrance as "kdumpctl test", which will remove $VMCORE_CREATION_STATUS before triggering crash. Please checkout the final force-pushed commits in 1.
This is a problem and it cannot be solved by this patch, the patch is based on the premise, but the premise is failed. Personally I suggest we give more time of testing the premise. If it cannot cover the majority proportion of our cases, there is no need to introduce this patch.