Closed liutgnu closed 1 month ago
Hi Tao, it is weird that the email notification still keeps using old description success/fail sentences. though it is V3 so no way to reply inline. Let me put a comment on web ui here: "kdump: Vmcore creation notice: No vmcore creation test performed!" For above notice, it looks redundant with two-time mentioning of "vmcore creation". I suggest just "notice: no vmcore creation test preformed!" Or "Warning: no vmcore creation test performed"
And "notice: Last successful vmcore creation on Tue Jun ..." or "Last successful vmcore creation on Tue Jun ..."
Also how about change "vmcore creation" to "kdump test", ditto about the status file name. It sounds more straightforward and easier to understand.
Hi Tao, otherwise I noticed that there is a special case for dump_to_rootfs in mkdumprd. So doing some testing and verification with fadump should be also necessary.
Hi Tao, it is weird that the email notification still keeps using old description success/fail sentences. though it is V3 so no way to reply inline. Let me put a comment on web ui here: "kdump: Vmcore creation notice: No vmcore creation test performed!" For above notice, it looks redundant with two-time mentioning of "vmcore creation". I suggest just "notice: no vmcore creation test preformed!" Or "Warning: no vmcore creation test performed"
And "notice: Last successful vmcore creation on Tue Jun ..." or "Last successful vmcore creation on Tue Jun ..."
Also how about change "vmcore creation" to "kdump test", ditto about the status file name. It sounds more straightforward and easier to understand.
Hi Dave,
Agreed, just notice: Last successful vmcore .... is enough, will get it improved in v4.
Hi Tao, otherwise I noticed that there is a special case for dump_to_rootfs in mkdumprd. So doing some testing and verification with fadump should be also necessary.
Yes, will do it, currently there is a OOM issue in kexec-tools which blocked my fadump testing, see RHEL-35516.
Hi @liutgnu,
I'm curious to ask why you prefer creating a new PR for a new version? For the perspective of maintainer, one benefit of pushing new version to the same PR is I can compare the difference between two versions with one-lick i.e. by clicking the "Compare" hyperlink,
Hi @liutgnu,
I'm curious to ask why you prefer creating a new PR for a new version? For the perspective of maintainer, one benefit of pushing new version to the same PR is I can compare the difference between two versions with one-lick i.e. by clicking the "Compare" hyperlink,
Please tell me which way you prefer. On one hand, the "force push" is dangerous and should be avoided, I guess a seperate MR of each patch version(pushing from different branch) is a safer one. On the other hand, "force push" against a single MR as you described, may have the following issue:
1: patch v1 (MR created) | V 2: force push1 (to improve v1) |
---|
V 3: force push2 (to improve v1 again, now the v1 is OK for review) | V 4: force push3 (Received comments, drafting v2) |
---|
V 5: force push4 (to improve v2, now v2 is ready for review) | V ....
You see, there might be plenty of "force pushes". I guess what you want to view is the diff of 3 & 5. I don't know if you can select the commit hash for diff comparison. What I know is it can only the diff of adjorning commits, please correct me if I'm wrong.
Thanks for raising the concerns! If you are worried that a force-push may overwrite the history of this repo, then you needn't to worry about it because this repo doesn't allow force-push which is also the default setting.
As for the case of comparing versions of 3 & 5, you can simply send new version of patches in one single force-push to avoid this case. Btw, you can also compare different commits directly e.g. compare your v3 with v2.
So I prefer sending new version of patches to the same PR. Another benefit is we can the same context in the same PR.
The v4 has been force pushed to this PR, which mainly introduced the "kdumpctl test" as the entrance for performing kdump test. @daveyoung @baoquan-he
In addition, in v3 Dave mentioned the combination of /kdumproot and /sysroot in kdump-lib.sh:get_kdump_mntpoint_from_target(), in order to simplify the dir mounting for saving $VMCORE_CREATION_STATUS. However I made an experiment showing the negative results, see the following:
If we unify the 2, and kdump through nfs and failure dump to rootfs, the mounting status will be like:
Arguments: --add 'kdumpbase' --quiet --hostonly --hostonly-cmdline --hostonly-i18n --hostonly-mode 'strict' --hostonly-nics '' -o 'plymouth resume ifcfg earlykdump' --mount '192.168.1.65:/tmp/nfs /sysroot nfs defaults' --squash-compressor 'zstd' --mount '/dev/mapper/rhel_ibm--p8--kvm--03--guest--02-root /sysroot xfs rw,relatime,seclabel,attr2,inode64,logbufs=8,logbsize=32k,noquota'
The original mounting status will be like:
Arguments: --add 'kdumpbase' --quiet --hostonly --hostonly-cmdline --hostonly-i18n --hostonly-mode 'strict' --hostonly-nics '' -o 'plymouth resume ifcfg earlykdump' --mount '192.168.1.65:/tmp/nfs /kdumproot nfs defaults' --squash-compressor 'zstd' --mount '/dev/mapper/rhel_ibm--p8--kvm--03--guest--02-root /sysroot xfs rw,relatime,seclabel,attr2,inode64,logbufs=8,logbsize=32k,noquota'
When nfs mount failed, it will try to mount sysroot as "systemctl start --no-block sysroot.mount", it will mount nfs again, instead of mounting the block device. So the mount will fail again, and dump_to_rootfs fails, which is unexpected.
see the 2nd kernel log:
[ 130.266759] mount[563]: mount.nfs: Connection refused [ 130.268887] systemd[1]: sysroot.mount: Mount process exited, code=exited, status=32/n/a [FAILED] Failed to mount /sysroot. See 'systemctl status sysroot.mount' for details. [ 130.272324] systemd[1]: Failed to mount /sysroot. ... [ 256.480662] kdump[700]: failed to dump to '/sysroot', it's not a mount point! [FAILED] Failed to start Kdump Vmcore Save Service. See 'systemctl status kdump-capture.service' for details. ... [ 257.293836] systemd[1]: sysroot.mount: Directory /sysroot to mount over is not empty, mounting anyway. [ 257.296748] systemd[1]: Mounting /sysroot... [ 257.304670] mount[776]: mount.nfs: rpc.statd is not running but is required for remote locking. ... [ 347.314093] kdump[981]: Failed to mount rootfs [ 347.316864] kdump[983]: Executing final action systemctl reboot -f
The v4 has been force pushed to this PR, which mainly introduced the "kdumpctl test" as the entrance for performing kdump test. @daveyoung @baoquan-he
In addition, in v3 Dave mentioned the combination of /kdumproot and /sysroot in kdump-lib.sh:get_kdump_mntpoint_from_target(), in order to simplify the dir mounting for saving $VMCORE_CREATION_STATUS. However I made an experiment showing the negative results, see the following:
If we unify the 2, and kdump through nfs and failure dump to rootfs, the mounting status will be like:
Arguments: --add 'kdumpbase' --quiet --hostonly --hostonly-cmdline --hostonly-i18n --hostonly-mode 'strict' --hostonly-nics '' -o 'plymouth resume ifcfg earlykdump' --mount '192.168.1.65:/tmp/nfs /sysroot nfs defaults' --squash-compressor 'zstd' --mount '/dev/mapper/rhel_ibm--p8--kvm--03--guest--02-root /sysroot xfs rw,relatime,seclabel,attr2,inode64,logbufs=8,logbsize=32k,noquota'
The original mounting status will be like:
Arguments: --add 'kdumpbase' --quiet --hostonly --hostonly-cmdline --hostonly-i18n --hostonly-mode 'strict' --hostonly-nics '' -o 'plymouth resume ifcfg earlykdump' --mount '192.168.1.65:/tmp/nfs /kdumproot nfs defaults' --squash-compressor 'zstd' --mount '/dev/mapper/rhel_ibm--p8--kvm--03--guest--02-root /sysroot xfs rw,relatime,seclabel,attr2,inode64,logbufs=8,logbsize=32k,noquota'
When nfs mount failed, it will try to mount sysroot as "systemctl start --no-block sysroot.mount", it will mount nfs again, instead of mounting the block device. So the mount will fail again, and dump_to_rootfs fails, which is unexpected.
see the 2nd kernel log:
[ 130.266759] mount[563]: mount.nfs: Connection refused [ 130.268887] systemd[1]: sysroot.mount: Mount process exited, code=exited, status=32/n/a [FAILED] Failed to mount /sysroot. See 'systemctl status sysroot.mount' for details. [ 130.272324] systemd[1]: Failed to mount /sysroot. ... [ 256.480662] kdump[700]: failed to dump to '/sysroot', it's not a mount point! [FAILED] Failed to start Kdump Vmcore Save Service. See 'systemctl status kdump-capture.service' for details. ... [ 257.293836] systemd[1]: sysroot.mount: Directory /sysroot to mount over is not empty, mounting anyway. [ 257.296748] systemd[1]: Mounting /sysroot... [ 257.304670] mount[776]: mount.nfs: rpc.statd is not running but is required for remote locking. ... [ 347.314093] kdump[981]: Failed to mount rootfs [ 347.316864] kdump[983]: Executing final action systemctl reboot -f
Hi Tao, I finally found your reply here. Thanks for trying the idea. It is fair enough it you can not use one mount point. Anyway I have still some questions about the mouting, added another comment in your new version
Hi Dave, I just force pusted the v5 a few hours ago. I tested the fadump case, and mainly fixed its error mounting issues. Please checkout the added code comments in mkdumprd.
Hi Philipp,
Thanks a lot for your careful review!
Hi Tao, here are some comments from my first round. I still need to think about a few things, e.g. I don't know if it really makes sense to store the status file on the rootfs. My first thought was that it would be better to store it on the target, next to the dumps instead. IMHO, that would make tracking the status easier, when you have a big fleet dumping to a remote host, e.g. via ssh. But I'm not entirely sure if we can assume that the target is always mounted/accessible. Especially when dumping to nfs.
Yes, saving the status file onto the target was my first option, but I gave it up due to the 2 reasons: 1) It is difficult to identify 'who created the vmcore/status file'. Say if we have a nfs/ssh server, several machines will dump vmcore/status file to it. Since we need to verify if kdump test is successful for one specific machine, we need to identify the vmcore/status file is created by it, no by others. Frankly there is no stable way to do that, the ip+date string isn't a good method, e.g the following case: physical machine_a hosted vm1 and vm2 by its NAT network, physical machine_b hosted vm3 and vm4 by its NAT network, and physical machine_c is for vmcore hosting. So vm1 and vm3 can have the same ip address as 10.0.2.15. How can we know if one vmcore/status file belongs to vm1 or vm3? My previous solution is to create a special string(a timestamp) in kernel before trigger kernel crash, and check the vmcore-dmesg.txt file if the string exists, to identify its belonging.
2) Though [1] is doable, but it need to access the vmcore server frenqently(each checking of kdump test success or not), QE/customers don't want us to access the vmcore server too much, unless we are dumping vmcore to it. In addition, accessing a remote server is slow, especially when there are hundreds of vmcores, which means we need to find the special string within hundreds vmcore-dmesg.txt files. It is slow, and nfs-server may not mounted as well...
So saving the status file locally is fast and have no problem to determine its belonging than saving remotely.
Hi Philipp,
Thanks a lot for your careful review!
Hi Tao, here are some comments from my first round. I still need to think about a few things, e.g. I don't know if it really makes sense to store the status file on the rootfs. My first thought was that it would be better to store it on the target, next to the dumps instead. IMHO, that would make tracking the status easier, when you have a big fleet dumping to a remote host, e.g. via ssh. But I'm not entirely sure if we can assume that the target is always mounted/accessible. Especially when dumping to nfs.
Yes, saving the status file onto the target was my first option, but I gave it up due to the 2 reasons: 1) It is difficult to identify 'who created the vmcore/status file'. Say if we have a nfs/ssh server, several machines will dump vmcore/status file to it. Since we need to verify if kdump test is successful for one specific machine, we need to identify the vmcore/status file is created by it, no by others. Frankly there is no stable way to do that, the ip+date string isn't a good method, e.g the following case: physical machine_a hosted vm1 and vm2 by its NAT network, physical machine_b hosted vm3 and vm4 by its NAT network, and physical machine_c is for vmcore hosting. So vm1 and vm3 can have the same ip address as 10.0.2.15. How can we know if one vmcore/status file belongs to vm1 or vm3? My previous solution is to create a special string(a timestamp) in kernel before trigger kernel crash, and check the vmcore-dmesg.txt file if the string exists, to identify its belonging.
Right, the ip+date is a pretty poor method to identify the system that dumped. My idea was to replace the ip with the machine-id. The machine-id is determined at the first boot after installation and then stays stable for ever. At least for "normal" installations. I'm not a 100% sure how it behaves for container images where the rootfs is ro. If I understand the man page correctly the machine-id is then determined every time on boot. So it won't be stable. But in that scenario the ip probably wouldn't be stable as well. So we could switch from ip+date to machine-id+date for the dumps and machine-id.status for the status files. Then we also don't need to check all the vmcore-dmesg.txt.
What do you think?
2. Though [1] is doable, but it need to access the vmcore server frenqently(each checking of kdump test success or not), QE/customers don't want us to access the vmcore server too much, unless we are dumping vmcore to it. In addition, accessing a remote server is slow, especially when there are hundreds of vmcores, which means we need to find the special string within hundreds vmcore-dmesg.txt files. It is slow, and nfs-server may not mounted as well...
True, but at least for ssh targets we already connect to them every time we start to check whether the network is up and running. So in that case there won't be too many additional connections. But then there is kdumpctl status
where we don't know whether we can reach the target. And of course nfs targets...
So saving the status file locally is fast and have no problem to determine its belonging than saving remotely.
True, but it also means that we need to mount a disk locally for cases where we didn't need it before. Which will require additional memory and thus increases the chance that the kdump kernel runs oom...
Hi Philipp, Thanks a lot for your careful review!
Hi Tao, here are some comments from my first round. I still need to think about a few things, e.g. I don't know if it really makes sense to store the status file on the rootfs. My first thought was that it would be better to store it on the target, next to the dumps instead. IMHO, that would make tracking the status easier, when you have a big fleet dumping to a remote host, e.g. via ssh. But I'm not entirely sure if we can assume that the target is always mounted/accessible. Especially when dumping to nfs.
Yes, saving the status file onto the target was my first option, but I gave it up due to the 2 reasons: 1) It is difficult to identify 'who created the vmcore/status file'. Say if we have a nfs/ssh server, several machines will dump vmcore/status file to it. Since we need to verify if kdump test is successful for one specific machine, we need to identify the vmcore/status file is created by it, no by others. Frankly there is no stable way to do that, the ip+date string isn't a good method, e.g the following case: physical machine_a hosted vm1 and vm2 by its NAT network, physical machine_b hosted vm3 and vm4 by its NAT network, and physical machine_c is for vmcore hosting. So vm1 and vm3 can have the same ip address as 10.0.2.15. How can we know if one vmcore/status file belongs to vm1 or vm3? My previous solution is to create a special string(a timestamp) in kernel before trigger kernel crash, and check the vmcore-dmesg.txt file if the string exists, to identify its belonging.
Right, the ip+date is a pretty poor method to identify the system that dumped. My idea was to replace the ip with the machine-id. The machine-id is determined at the first boot after installation and then stays stable for ever. At least for "normal" installations. I'm not a 100% sure how it behaves for container images where the rootfs is ro. If I understand the man page correctly the machine-id is then determined every time on boot. So it won't be stable. But in that scenario the ip probably wouldn't be stable as well. So we could switch from ip+date to machine-id+date for the dumps and machine-id.status for the status files. Then we also don't need to check all the vmcore-dmesg.txt.
What do you think?
I guess we cannot do that, the machine-id should be encrpted and shouldn't be exposed to outeside. See the "man machine-id", as I quote some of it:
This ID uniquely identifies the host. It should be considered "confidential", and must not be exposed in untrusted environments, in particular on the network...
If we do using some hash/encrypt approach before using machine-id, I guess it will introduce extra complexity for kdump function.
2. Though [1] is doable, but it need to access the vmcore server frenqently(each checking of kdump test success or not), QE/customers don't want us to access the vmcore server too much, unless we are dumping vmcore to it. In addition, accessing a remote server is slow, especially when there are hundreds of vmcores, which means we need to find the special string within hundreds vmcore-dmesg.txt files. It is slow, and nfs-server may not mounted as well...
True, but at least for ssh targets we already connect to them every time we start to check whether the network is up and running. So in that case there won't be too many additional connections. But then there is
kdumpctl status
where we don't know whether we can reach the target. And of course nfs targets...
Yeah, since CEE will use "kdumpctl test" often, and I guess nfs targets are their priority. So we need to come up with a universal approach, not only for local/ssh, but also nfs target...
So saving the status file locally is fast and have no problem to determine its belonging than saving remotely.
True, but it also means that we need to mount a disk locally for cases where we didn't need it before. Which will require additional memory and thus increases the chance that the kdump kernel runs oom...
Yes agreed, but I guess it is OK to increase the creashkernel size a little bit in order to support this.
Hi Philipp, Thanks a lot for your careful review!
Hi Tao, here are some comments from my first round. I still need to think about a few things, e.g. I don't know if it really makes sense to store the status file on the rootfs. My first thought was that it would be better to store it on the target, next to the dumps instead. IMHO, that would make tracking the status easier, when you have a big fleet dumping to a remote host, e.g. via ssh. But I'm not entirely sure if we can assume that the target is always mounted/accessible. Especially when dumping to nfs.
Yes, saving the status file onto the target was my first option, but I gave it up due to the 2 reasons: 1) It is difficult to identify 'who created the vmcore/status file'. Say if we have a nfs/ssh server, several machines will dump vmcore/status file to it. Since we need to verify if kdump test is successful for one specific machine, we need to identify the vmcore/status file is created by it, no by others. Frankly there is no stable way to do that, the ip+date string isn't a good method, e.g the following case: physical machine_a hosted vm1 and vm2 by its NAT network, physical machine_b hosted vm3 and vm4 by its NAT network, and physical machine_c is for vmcore hosting. So vm1 and vm3 can have the same ip address as 10.0.2.15. How can we know if one vmcore/status file belongs to vm1 or vm3? My previous solution is to create a special string(a timestamp) in kernel before trigger kernel crash, and check the vmcore-dmesg.txt file if the string exists, to identify its belonging.
Right, the ip+date is a pretty poor method to identify the system that dumped. My idea was to replace the ip with the machine-id. The machine-id is determined at the first boot after installation and then stays stable for ever. At least for "normal" installations. I'm not a 100% sure how it behaves for container images where the rootfs is ro. If I understand the man page correctly the machine-id is then determined every time on boot. So it won't be stable. But in that scenario the ip probably wouldn't be stable as well. So we could switch from ip+date to machine-id+date for the dumps and machine-id.status for the status files. Then we also don't need to check all the vmcore-dmesg.txt. What do you think?
I guess we cannot do that, the machine-id should be encrpted and shouldn't be exposed to outeside. See the "man machine-id", as I quote some of it:
This ID uniquely identifies the host. It should be considered "confidential", and must not be exposed in untrusted environments, in particular on the network...
Yeah, I read that as well. But I believe that for our use case we can assume that the dump target can be trusted even when it's a remote target. Reason is that, if we cannot trust the dump target we cannot write the dump there as well. In the end a kernel dump will always contain a lot of confidential information. So if we write a dump to an untrusted target the machine-id is one of the smaller problems we have.
If we do using some hash/encrypt approach before using machine-id, I guess it will introduce extra complexity for kdump function.
Fully agree, but as said above IMHO that shouldn't be necessary.
2. Though [1] is doable, but it need to access the vmcore server frenqently(each checking of kdump test success or not), QE/customers don't want us to access the vmcore server too much, unless we are dumping vmcore to it. In addition, accessing a remote server is slow, especially when there are hundreds of vmcores, which means we need to find the special string within hundreds vmcore-dmesg.txt files. It is slow, and nfs-server may not mounted as well...
True, but at least for ssh targets we already connect to them every time we start to check whether the network is up and running. So in that case there won't be too many additional connections. But then there is
kdumpctl status
where we don't know whether we can reach the target. And of course nfs targets...Yeah, since CEE will use "kdumpctl test" often, and I guess nfs targets are their priority. So we need to come up with a universal approach, not only for local/ssh, but also nfs target...
True...
So saving the status file locally is fast and have no problem to determine its belonging than saving remotely.
True, but it also means that we need to mount a disk locally for cases where we didn't need it before. Which will require additional memory and thus increases the chance that the kdump kernel runs oom...
Yes agreed, but I guess it is OK to increase the creashkernel size a little bit in order to support this.
Yes, I guess so...
Now the v7 is posted
Hi Philipp, Thanks a lot for your careful review!
Hi Tao, here are some comments from my first round. I still need to think about a few things, e.g. I don't know if it really makes sense to store the status file on the rootfs. My first thought was that it would be better to store it on the target, next to the dumps instead. IMHO, that would make tracking the status easier, when you have a big fleet dumping to a remote host, e.g. via ssh. But I'm not entirely sure if we can assume that the target is always mounted/accessible. Especially when dumping to nfs.
Yes, saving the status file onto the target was my first option, but I gave it up due to the 2 reasons: 1) It is difficult to identify 'who created the vmcore/status file'. Say if we have a nfs/ssh server, several machines will dump vmcore/status file to it. Since we need to verify if kdump test is successful for one specific machine, we need to identify the vmcore/status file is created by it, no by others. Frankly there is no stable way to do that, the ip+date string isn't a good method, e.g the following case: physical machine_a hosted vm1 and vm2 by its NAT network, physical machine_b hosted vm3 and vm4 by its NAT network, and physical machine_c is for vmcore hosting. So vm1 and vm3 can have the same ip address as 10.0.2.15. How can we know if one vmcore/status file belongs to vm1 or vm3? My previous solution is to create a special string(a timestamp) in kernel before trigger kernel crash, and check the vmcore-dmesg.txt file if the string exists, to identify its belonging.
Right, the ip+date is a pretty poor method to identify the system that dumped. My idea was to replace the ip with the machine-id. The machine-id is determined at the first boot after installation and then stays stable for ever. At least for "normal" installations. I'm not a 100% sure how it behaves for container images where the rootfs is ro. If I understand the man page correctly the machine-id is then determined every time on boot. So it won't be stable. But in that scenario the ip probably wouldn't be stable as well. So we could switch from ip+date to machine-id+date for the dumps and machine-id.status for the status files. Then we also don't need to check all the vmcore-dmesg.txt. What do you think?
I guess we cannot do that, the machine-id should be encrpted and shouldn't be exposed to outeside. See the "man machine-id", as I quote some of it: This ID uniquely identifies the host. It should be considered "confidential", and must not be exposed in untrusted environments, in particular on the network...
Yeah, I read that as well. But I believe that for our use case we can assume that the dump target can be trusted even when it's a remote target. Reason is that, if we cannot trust the dump target we cannot write the dump there as well. In the end a kernel dump will always contain a lot of confidential information. So if we write a dump to an untrusted target the machine-id is one of the smaller problems we have.
OK, agreed. However one thing I noticed from my fedora:37 container, the systemd package is not installed: [root@3c6570f17fba system]# rpm -qa|grep systemd systemd-libs-251.10-588.fc37.x86_64
So there is no systemd-machine-id-commit.service to generate the machine-id, instead we got and empty file: [root@3c6570f17fba system]# rpm -qf /etc/machine-id file /etc/machine-id is not owned by any package [root@3c6570f17fba system]# ls -ail /etc/machine-id 3046437 -rw-r--r--. 1 root root 0 Feb 5 2023 /etc/machine-id
I guess machine-id is not a stable approach for containers than ip+date...
If we do using some hash/encrypt approach before using machine-id, I guess it will introduce extra complexity for kdump function.
Fully agree, but as said above IMHO that shouldn't be necessary.
2. Though [1] is doable, but it need to access the vmcore server frenqently(each checking of kdump test success or not), QE/customers don't want us to access the vmcore server too much, unless we are dumping vmcore to it. In addition, accessing a remote server is slow, especially when there are hundreds of vmcores, which means we need to find the special string within hundreds vmcore-dmesg.txt files. It is slow, and nfs-server may not mounted as well...
True, but at least for ssh targets we already connect to them every time we start to check whether the network is up and running. So in that case there won't be too many additional connections. But then there is
kdumpctl status
where we don't know whether we can reach the target. And of course nfs targets...Yeah, since CEE will use "kdumpctl test" often, and I guess nfs targets are their priority. So we need to come up with a universal approach, not only for local/ssh, but also nfs target...
True...
So saving the status file locally is fast and have no problem to determine its belonging than saving remotely.
True, but it also means that we need to mount a disk locally for cases where we didn't need it before. Which will require additional memory and thus increases the chance that the kdump kernel runs oom...
Yes agreed, but I guess it is OK to increase the creashkernel size a little bit in order to support this.
Yes, I guess so...
Hi @prudo1 , kindly ping. Could you please review the patch? If you have no time, then we can ask other's help. Thanks!
Hi @prudo1 ,
Hi Philipp, Thanks a lot for your careful review!
Hi Tao, here are some comments from my first round. I still need to think about a few things, e.g. I don't know if it really makes sense to store the status file on the rootfs. My first thought was that it would be better to store it on the target, next to the dumps instead. IMHO, that would make tracking the status easier, when you have a big fleet dumping to a remote host, e.g. via ssh. But I'm not entirely sure if we can assume that the target is always mounted/accessible. Especially when dumping to nfs.
Yes, saving the status file onto the target was my first option, but I gave it up due to the 2 reasons: 1) It is difficult to identify 'who created the vmcore/status file'. Say if we have a nfs/ssh server, several machines will dump vmcore/status file to it. Since we need to verify if kdump test is successful for one specific machine, we need to identify the vmcore/status file is created by it, no by others. Frankly there is no stable way to do that, the ip+date string isn't a good method, e.g the following case: physical machine_a hosted vm1 and vm2 by its NAT network, physical machine_b hosted vm3 and vm4 by its NAT network, and physical machine_c is for vmcore hosting. So vm1 and vm3 can have the same ip address as 10.0.2.15. How can we know if one vmcore/status file belongs to vm1 or vm3? My previous solution is to create a special string(a timestamp) in kernel before trigger kernel crash, and check the vmcore-dmesg.txt file if the string exists, to identify its belonging.
Right, the ip+date is a pretty poor method to identify the system that dumped. My idea was to replace the ip with the machine-id. The machine-id is determined at the first boot after installation and then stays stable for ever. At least for "normal" installations. I'm not a 100% sure how it behaves for container images where the rootfs is ro. If I understand the man page correctly the machine-id is then determined every time on boot. So it won't be stable. But in that scenario the ip probably wouldn't be stable as well. So we could switch from ip+date to machine-id+date for the dumps and machine-id.status for the status files. Then we also don't need to check all the vmcore-dmesg.txt. What do you think?
I guess we cannot do that, the machine-id should be encrpted and shouldn't be exposed to outeside. See the "man machine-id", as I quote some of it: This ID uniquely identifies the host. It should be considered "confidential", and must not be exposed in untrusted environments, in particular on the network...
Yeah, I read that as well. But I believe that for our use case we can assume that the dump target can be trusted even when it's a remote target. Reason is that, if we cannot trust the dump target we cannot write the dump there as well. In the end a kernel dump will always contain a lot of confidential information. So if we write a dump to an untrusted target the machine-id is one of the smaller problems we have.
OK, agreed. However one thing I noticed from my fedora:37 container, the systemd package is not installed: [root@3c6570f17fba system]# rpm -qa|grep systemd systemd-libs-251.10-588.fc37.x86_64
So there is no systemd-machine-id-commit.service to generate the machine-id, instead we got and empty file: [root@3c6570f17fba system]# rpm -qf /etc/machine-id file /etc/machine-id is not owned by any package [root@3c6570f17fba system]# ls -ail /etc/machine-id 3046437 -rw-r--r--. 1 root root 0 Feb 5 2023 /etc/machine-id
I guess machine-id is not a stable approach for containers than ip+date...
I re-thought about the machine-id approach, and I think the above container case is not correct, kdump is not supposted to run within a container. However there are cases as CoreOS which uses ostree as layers for fs. Though CoreOS will mount /etc as writable, aka the machine-id can be generated in the first boot and stays there, but I'm not sure if there are cases which will mount /etc as read-only, or users can make such a customization. I'm a newbie to ostree :(
The ip+date string, though very poor, but better for human reading. E.g. /var/crash/10.0.2.15-2024-07-11-07:54:41, is much readable than /var/crash/machine-id-1d4c642df43846/. The former one indication the machine is more like to be a VM, so if we are searching for one bare-metal dumped vmcore, this one is easily to skipped.
So IMOH I don't have very strong motivation for replacing ip+date to be machine-id string for vmcore folder naming for now.
Other than machine-id for vmcore labeling, Dave gave me a different solution: 1) before trigger kernel panic, we generate a unique string, and put it into a file. 2) mkdumprd include the file into initramfs-kdump.img. 3) in 2nd kernel, after creating vmcore, cp the file aside of vmcore. 4) kdumpctl status can check if the string aside of vmcore is expected. The file can also connect with kdump.status, if we draft the file content by some format. This method is easy and doable, if we do want to save the kdump.status on to the remote machine.
Personally I would prefer to save the kdump.status into local drive rather than remote, just because it can make "kdumpctl status" working faster and reliable, since we don't need to care about the future network status after "kdumpctl test".
Any comments?
Thanks, Tao Liu
If we do using some hash/encrypt approach before using machine-id, I guess it will introduce extra complexity for kdump function.
Fully agree, but as said above IMHO that shouldn't be necessary.
2. Though [1] is doable, but it need to access the vmcore server frenqently(each checking of kdump test success or not), QE/customers don't want us to access the vmcore server too much, unless we are dumping vmcore to it. In addition, accessing a remote server is slow, especially when there are hundreds of vmcores, which means we need to find the special string within hundreds vmcore-dmesg.txt files. It is slow, and nfs-server may not mounted as well...
True, but at least for ssh targets we already connect to them every time we start to check whether the network is up and running. So in that case there won't be too many additional connections. But then there is
kdumpctl status
where we don't know whether we can reach the target. And of course nfs targets...Yeah, since CEE will use "kdumpctl test" often, and I guess nfs targets are their priority. So we need to come up with a universal approach, not only for local/ssh, but also nfs target...
True...
So saving the status file locally is fast and have no problem to determine its belonging than saving remotely.
True, but it also means that we need to mount a disk locally for cases where we didn't need it before. Which will require additional memory and thus increases the chance that the kdump kernel runs oom...
Yes agreed, but I guess it is OK to increase the creashkernel size a little bit in order to support this.
Yes, I guess so...
Hi @liutgnu, this PR is almost perfect. Just a few very small nits. The biggest "problems" I see is that with #33 merged you need to rebase to the latest main, as there will be some conflicts. In addition you need to add the new sub-command to
kdumpctl.8
man page. With that fixed the PR can be merged. Thanks!
Hi @prudo1 , thanks a lot for your review. I will get those improved in v8
Hi @liutgnu, this PR is almost perfect. Just a few very small nits. The biggest "problems" I see is that with #33 merged you need to rebase to the latest main, as there will be some conflicts. In addition you need to add the new sub-command to
kdumpctl.8
man page. With that fixed the PR can be merged. Thanks!Hi @prudo1 , thanks a lot for your review. I will get those improved in v8
Hi @prudo1 , please see the v8 I just posted. Thanks in advance!