goss-org / goss

Quick and Easy server testing/validation
https://goss.rocks
Apache License 2.0
5.5k stars 470 forks source link

Facing Error exec: not started while executing tests of type command #854

Closed ns-rkathrecha closed 7 months ago

ns-rkathrecha commented 8 months ago

Describe the bug We are facing an issue while executing the tests of type command. Our tests are randomly failing with following output stdout: Error exec: not started. We have around 100 stacks in our environment and on each stack on an average 1000 test cases are running at scheduled interval. Out of this test cases we have observed that at some point of time few tests related to command execution are failing with the above mentioned error. Although, the error is temporary and gets resolved in the next run but it is happening frequently on random basis. One more thing to add here is that our test cases are running on both VMs as well as Kubernetes pods but this issue is arriving in case of Kubernetes pods only.

How To Reproduce We don't have exact steps to reproduce this issue as it is happening on random basis. May be parallel execution of test cases with bigger count can reproduce.

Expected Behavior Test cases should not be failed with this type of random error.

Actual Behavior Test cases are failing randomly with following output stdout: Error exec: not started

Environment:

aelsabbahy commented 8 months ago

Are you able to compile goss? If so, can you try the version on this branch: bug_fix

If not, can you let me know the architecture and I can upload a binary for you to test.

It won't fix the error, but should provide a more informative error which hopefully leads us to the cause.

ns-rkathrecha commented 8 months ago

Mostly we are downloading the binary by using this link: curl -L https://github.com/goss-org/goss/releases/latest/download/goss-linux-amd64 -o /usr/local/bin/goss. So, if you can provide the binary according to this it will be great. Otherwise I can go ahead and compile goss with the branch you mentioned. Let me know.

aelsabbahy commented 8 months ago

Sure, I believe this binary will provide a different error. My assumption is perhaps a system limit is being hit somewhere and causing failures to even start invoking the test.

goss-linux-amd64.gz

ns-rkathrecha commented 8 months ago

Thanks! I will try with this binary and let you know the outcome.

ns-mjames commented 8 months ago

@aelsabbahy Due to https://www.suse.com/security/cve/CVE-2023-39323.html , We have created a new build locally with go version 1.20.9. But note, we were facing this issue intermittently before as well.

Just want to highlight the point that, we are facing this issue only in k8s pod and not in vm

aelsabbahy commented 7 months ago

Just following up on this. Have you received a different error yet? Very curious on reasons starting a command may fail in a k8s pod.

ns-rkathrecha commented 7 months ago

@aelsabbahy We haven't seen this error yet after updating the binary. I will let you know once we face this error again.

aelsabbahy commented 7 months ago

Sounds good, for what it's worth the latest binary contains the changes: v0.4.4

ns-rkathrecha commented 7 months ago

Hi @aelsabbahy we haven't face this error recently after updating the binary. So, as of now I'm closing this issue. We will get back to you if we face this error again. Thanks!