Closed gitRam18 closed 3 years ago
This looks strange.If the script ran successfully ,ideally it should run when triggered from mangle as well. Can you share the output when you run the script on the machine.The inputcommand and output
Here is the output of the commands I ran on the same machine with same user and from the same folder as I mentioned in mangle to copy the script:
infra1@uklvadsb0257[DEV][~] $ cd temp
infra1@uklvadsb0257[DEV][temp] $ whoami infra1
infra1@uklvadsb0257[DEV][temp] $ pwd /home/infra1/temp
infra1@uklvadsb0257[DEV][temp] $ tc qdisc show qdisc noqueue 0: dev lo root refcnt 2 qdisc mq 0: dev eth0 root qdisc pfifo_fast 0: dev eth0 parent :4 bands 3 priomap 1 2 2 2 1 2 0 0 1 1 1 1 1 1 1 1 qdisc pfifo_fast 0: dev eth0 parent :3 bands 3 priomap 1 2 2 2 1 2 0 0 1 1 1 1 1 1 1 1 qdisc pfifo_fast 0: dev eth0 parent :2 bands 3 priomap 1 2 2 2 1 2 0 0 1 1 1 1 1 1 1 1 qdisc pfifo_fast 0: dev eth0 parent :1 bands 3 priomap 1 2 2 2 1 2 0 0 1 1 1 1 1 1 1 1 qdisc noqueue 0: dev docker0 root refcnt 2 qdisc noqueue 0: dev veth608d37b root refcnt 2 qdisc noqueue 0: dev vethdc27dc7 root refcnt 2
infra1@uklvadsb0257[DEV][temp] $ tc qdisc add dev eth0 root netem delay 500ms RTNETLINK answers: Operation not permitted
infra1@uklvadsb0257[DEV][temp] $ sudo tc qdisc add dev eth0 root netem delay 500ms
infra1@uklvadsb0257[DEV][temp] $ tc qdisc show qdisc noqueue 0: dev lo root refcnt 2 qdisc netem 8018: dev eth0 root refcnt 5 limit 1000 delay 500.0ms qdisc noqueue 0: dev docker0 root refcnt 2 qdisc noqueue 0: dev veth608d37b root refcnt 2 qdisc noqueue 0: dev vethdc27dc7 root refcnt 2
infra1@uklvadsb0257[DEV][temp] $
@jayasankarr1990 I tried the same thing, I think the tc
command is not found because /sbin
is not in PATH
when its run on the fly with ssh
, when I ran $ssh username@ip "export PATH='/sbin:$PATH';tc"
it worked fine.
@jayasankarr1990 I tried the same thing, I think the
tc
command is not found because/sbin
is not inPATH
when its run on the fly withssh
, when I ran$ssh username@ip "export PATH='/sbin:$PATH';tc"
it worked fine.
@lladhibhutall thanks for the feedback. We will fix this. @gitRam18 could you confirm if the workaround suggested by @lladhibhutall works for you as well?
@lladhibhutall and myself tried this together. Check if the path needs to be /usr/sbin Or /sbin whichever is more appropriate from the security perspective and let us know once you fixed.
One more request I have is:
Couple of questions I have are:
@gitRam18 there is a reason for making auto remediation mandatory for network faults. If there is an unusually high value set for say latency, then remote access (using putty and other tools) to the system under test would start failing. You will have to know where the machine is deployed and will have to manually do a remediation. In such cases it is better to have an auto-remediation in place. Let me know if you disagree to this thought.
I understand but my challenge is on how to verify whether the fault has been injected or not. Do you suggest any tools to capture results before they get auto remediated? As I mentioned, logs are also deleted at the end.
Could you please respond on my first question as well regarding latency > timeout and in that case how the latency is effective if you remediate automatically.
And one more, do you have the new package with path corrections for sbin?
Please advise
we will allow logs to be available after remediation and fix the sbin issue in the next release.Will update the bug once fixed. Regarding latency> timeout ,latency will be effective till timeout happens.
The issue is resolved with the 3.0.0 release. Please upgrade to 3.0.0 and verify.
The documentation to Mangle upgrade available at https://vmware-1.gitbook.io/mangle/mangle-administration/supported-deployment-models#upgrading-existing-instances-of-mangle
Closing the issue as fixed in 3.0.
Describe the issue: I am trying mangle deployed in a openshift environment. I am trying to inject a network fault (packet delay or packet duplication) on a remote linux machine within my organizational network but the networkFault script is failing with an error as below:
com.vmware.mangle.utils.exceptions.MangleException: Precheck Failed with pre-requisites : tc is required, ip is required
The networkFault script got successfully transferred to the machine followed by successful SSH connection to execute the command. The command execution is failing.
Steps to reproduce:
Logs: 2020-04-24 16:25:07.640 [SystemResourceFaultTaskHelper-1587745499241] DEBUG com.vmware.mangle.task.framework.helpers.CommandInfoExecutionHelper.getAbsoluteCommand (195) - Absolute Command is /home/infra1/temp/networkFault.sh --operation=inject --faultOperation=NETWORK_DELAY_MILLISECONDS --latency=1000 --percentage=0 --nicName=eth0 --timeout=5000 2020-04-24 16:25:07.641 [SystemResourceFaultTaskHelper-1587745499241] INFO com.vmware.mangle.utils.clients.ssh.SSHUtils.runCommandReturningResult (156) - Running Command ... 2020-04-24 16:25:10.471 [SystemResourceFaultTaskHelper-1587745499241] DEBUG com.vmware.mangle.utils.clients.ssh.SSHUtils.runCommandReturningResult (165) - SSH Connected Successfully
2020-04-24 16:25:11.473 [SystemResourceFaultTaskHelper-1587745499241] DEBUG com.vmware.mangle.utils.clients.ssh.SSHUtils.runCommandReturningResult (169) - Command-output: Precheck Failed with pre-requisites : tc is required, ip is required
2020-04-24 16:25:11.474 [SystemResourceFaultTaskHelper-1587745499241] DEBUG com.vmware.mangle.utils.clients.ssh.SSHUtils.runCommandReturningResult (170) - exit-status: 127 2020-04-24 16:25:11.477 [SystemResourceFaultTaskHelper-1587745499241] ERROR com.vmware.mangle.services.tasks.executor.TaskExecutor.runTask (231) - Task Execution Failed. Reason: ErrorCode : FI0015, ErrorMessage : Execution of Command: /home/infra1/temp/networkFault.sh --operation=inject --faultOperation=NETWORK_DELAY_MILLISECONDS --latency=1000 --percentage=0 --nicName=eth0 --timeout=5000 failed. errorCode: 127 output: Precheck Failed with pre-requisites : tc is required, ip is required . com.vmware.mangle.utils.exceptions.MangleException: Precheck Failed with pre-requisites : tc is required, ip is required
Expected Behavior: Packet delay should have been injected on to the machine and the device eth0
Additional Info: I tried a similar script (networkFault.sh) on the machine directly and able to inject the fault. I used the same user credentials as entered in mangle. I also verified the tc & ip utilities are available on the machine and able to run them manually.