kdebisschop / rundeck-rancher-node-plugin

Manage Rancher-controlled Docker containers in Rundeck
Apache License 2.0
5 stars 0 forks source link

Inline scripts failed time to time #62

Open mashwino opened 3 years ago

mashwino commented 3 years ago

Hello kdebisschop, I hope you are doing well in this strange period :-/

I come back to you regarding a problem I have with "Inline script". I have started a new rundeck instance from scratch to be sure to not get disturbances from other plugins or features, add the plugin and make a simple bash Inline script executing the hostname command. Rundeck is connected via the plugin to a Rancher instance managing about 200 containers.

The bug is that time to time, the execution of the script is not done with this error: "chmod: cannot access /tmp/117-28-custeuro_preprod-custeuro-app-com-rsl-task-pre-prod-custeuro-app-com-1-dispatch-script.tmp.sh: No such file or directory" and the fact is that the temporary file is present on the container: ls -lrt /tmp/117-28-custeuro_preprod-custeuro-app-com-rsl-task-pre-prod-custeuro-app-com-1-dispatch-script.tmp.sh -rw-r--r-- 1 root root 42 14 avril 13:58 /tmp/117-28-custeuro_preprod-custeuro-app-com-rsl-task-pre-prod-custeuro-app-com-1-dispatch-script.tmp.sh So the script has well been copied, but for any reason it has not been found to be executed. The screenshots display the same job executed twice, one succeed and the other failed.

script_status script_ok script_nok

I tried many different settings modifications (Sync cache,cache delay, refresh nodes before execute, use scp or rancher copy, include services, global containers, system containers ...), but the problem is till present.

If I use the generic command menu of the project to launch the 'hostname' command on nodes, I do not get any issue.

Any idea what could cause this behavior ?

Thank you for your help.

Regards,

kdebisschop commented 3 years ago

You say "from time to time"

That implies sometimes it does work, correct?

Can you estimate the frequency of failure? Are the servers close to each other on the network? Does debug show anything? Maybe a log of a failed network connection?

Have you found anything reproducible about the failure?

mashwino commented 3 years ago

Yep, in the screenshots you see, one succeed and one failed in the same minute, just lauched the job 3-4 times manually, if I continue to launch it it will alternate between failed and succeed. I don't think it would be linked to network issue, as I understand it could not modify the file because it does not find it, so I guess it did successfuly connect to a container. It's like it was connected to a container, but not the right one. The .sh file is well copied, but cannot execute it. Here a debug log:

[workflow] beginExecuteNodeStep(cust_preprod-cust-app-com-rsl-task-pre-prod-cust-app-com-1): NodeDispatch: ScriptFileItem{script=[42 chars]} copying file: '/var/docker/data/rundeck/var/tmp/dispatch8183123616408905231.tmp' to: 'cust_preprod-cust-app-com-rsl-task-pre-prod-cust-app-com-1:/tmp/288-34-cust_preprod-cust-app-com-rsl-task-pre-prod-cust-app-com-1-dispatch-script.tmp.sh' PUT: '/var/docker/data/rundeck/var/tmp/dispatch8183123616408905231.tmp' Copied '/var/docker/data/rundeck/var/tmp/dispatch8183123616408905231.tmp' to '/tmp/288-34-cust_preprod-cust-app-com-rsl-task-pre-prod-cust-app-com-1-dispatch-script.tmp.sh Running chmod +x /tmp/288-34-cust_preprod-cust-app-com-rsl-task-pre-prod-cust-app-com-1-dispatch-script.tmp.sh chmod: impossible d'accéder à '/tmp/288-34-cust_preprod-cust-app-com-rsl-task-pre-prod-cust-app-com-1-dispatch-script.tmp.sh': Aucun fichier ou dossier de ce type Ran chmod +x /tmp/288-34-cust_preprod-cust-app-com-rsl-task-pre-prod-cust-app-com-1-dispatch-script.tmp.sh Reading '/tmp/test341618504357270-429418367.pid' on https://rancher.sequoiasoft.com/v2-beta/projects/1a1466/containers/1i921647/?action=execute Failed: PluginFailed: Process 1437 status 1 [workflow] finishExecuteNodeStep(cust_preprod-cust-app-com-rsl-task-pre-prod-cust-app-com-1): NodeDispatch: PluginFailed: Process 1437 status 1

It's in french, chmod: impossible d'accéder à '/tmp/288-34-cust_preprod-cust-app-com-rsl-task-pre-prod-cust-app-com-1-dispatch-script.tmp.sh': Aucun fichier ou dossier de ce type' meens 'cannot access file, no such file or directory'

kdebisschop commented 3 years ago

I cannot seem to reproduce this at all -- I have tried with both the file copy based on the rancher cli, and the configuration that uses the wbe socket to do the copy without the rancher CLI.

You are using the web socket -- I see tha from the PUT.

Maybe that is failing (it is a bit of a hack) -- if the file for the inline script is larger than mine, maybe it becomes unreliable? Can you install rancher CLI on the RunDeck host and add its path to the file copier configuration...then see if that makes a difference?

kdebisschop commented 3 years ago

Here's a screenshot of the configuration I'm referring to:

Screen Shot 2021-04-19 at 22 10 37

For this to work, the rundeck account role needs to be able to access the rancher CLI and the CLI needs to be able to access the required credentials to communicate with your rancher system.