Closed kro-cat closed 1 year ago
Quick and dirty implementation I did to mitigate this temporarily.
diff --git a/pkg/lvm/lvm_util.go b/pkg/lvm/lvm_util.go
index 0bb4877..e9b52ed 100644
--- a/pkg/lvm/lvm_util.go
+++ b/pkg/lvm/lvm_util.go
@@ -17,6 +17,7 @@ limitations under the License.
package lvm
import (
+ "bytes"
"encoding/json"
"fmt"
"os"
@@ -663,11 +664,16 @@ func ListLVMVolumeGroup(reloadCache bool) ([]apis.VolumeGroup, error) {
"--units", "b",
}
cmd := exec.Command(VGList, args...)
- output, err := cmd.CombinedOutput()
+ var execOut bytes.Buffer
+ // var execErr bytes.Buffer
+ cmd.Stdout = &execOut
+ // cmd.Stderr = &execErr // do something with this useful information
+ err := cmd.Run()
if err != nil {
klog.Errorf("lvm: list volume group cmd %v: %v", args, err)
return nil, err
}
+ output := execOut.Bytes()
return decodeVgsJSON(output)
}
@kro-cat Thank you for reporting the issue. Can it be reproduced easily? If yes, please help us with the steps.
Since you have found the fix for it also. Would you like to raise a PR for this?
@abhilashshetty04 Of course! I'll have a PR up in a bit.
I'm convinced this is a niche issue. My current hypothesis is that it's due to the fact that the host machine has a different lvm system_id than the lvm-driver container. When lvm detects a foreign system id on a volume group, it will print a warning on STDERR; this is the kind of message that led me to this issue.
Since the system id can be set when using vgcreate
, this shouldn't necessarily cause problems for volume groups meant to be managed by lvm-localpv so long as I'm aware of the correct system id to set (I think, by default, it's an empty string).
I haven't tested that yet, so I can't be certain that it's entirely correct.
By "in a bit" I guess I really meant to say "when I'm confident it actually solves the problem I'm having instead of just hiding it"
@kro-cat , Thanks for replying. We pre-provision vg (volume group) anyway right? So what do you mean by lvm-driver detects vg not managed by itself?
@abhilashshetty04
The daemonset can see the any of the host's lvm devices due to its /dev
hostPath mount.
I'll try my best to explain why that might cause a problem.
In my particular use case, the host uses lvm logical volumes for its own filesystem (/var, /home, stuff like that). These volume groups cannot be managed by lvm-localpv, so I don't include them in its configuration. Additionally, the host's lvm.conf has been configured beyond defaults; notably, the hosts each have a unique systemid configured. These are the kinds of volumes whose volume groups create this issue.
When the lvm-driver pod executes vgs
, it's intention is to discover the volume groups it's configured to manage. However, because the host's own volume group (and lvm.conf) are configured outside of lvm-driver, this has the potential to cause vgs
(which shows all volume groups in /dev
) to display warnings or errors about the differences or incompatibilities (foreign systemid, no lvmlockd, and others).
I think I should follow-up on my previous comment. The issue here is that lvm-driver crashes unexpectedly when warning messages are mixed with JSON output from vgs
.
I've put a lot of effort into explaining why the warnings messages are produced, but that's not the point of this issue, nor is that necessarily an issue with lvm-localpv itself.
The PR for this issue addresses the unexpected crashing: split the STDERR output from vgs
and print it to the log. This provides much-needed diagnostic information while not affecting the health of the daemonset.
@kro-cat @Abhinandan-Purkait I have the same issue with lvs and pvs. I think all commands should use RunCommandSplit instead of CombindOutput.
@kro-cat @Abhinandan-Purkait I have the same issue with lvs and pvs. I think all commands should use RunCommandSplit instead of CombindOutput.
@Mjolkhare I think I'll set up a PR this weekend for that since I never did that refactor in pull #250, unless of course someone beats me to it.
@kro-cat Thanks for being on top of this.
I recently ran into a bug which is causing the lvm driver to crash unexpectedly on my systems.
I've traced it to pkg/lvm/lvm_util.go line 666, in ListLVMVolumeGroup()
STDERR and STDOUT are combined via
cmd.CombinedOutput()
. Thevgs
program may produce non-critical warning messages on STDERR which don't affect the operation of this driver. Warning messages also don't affect the return code of thevgs
program. Any warning messages printed to STDERR will fail the JSON decode step indecodeVgsJSON()
.