Closed takkeybook closed 7 years ago
I could reproduce it when the script wasn't executable, the error I got is:
/bin/sh: 1: scripts/step1.sh: Permission denied
2017-05-26 13:21:13 +1000 [ERROR] (0017@[0:default]+sample+step1): Task failed with unexpected error: Command failed with code 126
but, when I made the shell script executable ( chmod a+x scripts/step1.sh ), it ran:
2017-05-26 13:21:51 +1000 [INFO] (0017@[0:default]+sample+step1): sh>: scripts/step1.sh
job 1 start
Friday 26 May 13:21:52 AEST 2017
k7
job 1 end
Success. Task state is saved at /tmp/dig_test/.digdag/status/20170526T000000+0900 directory.
In my case, the script is executable. In fact, the corresponding task succeeded when it ran on digdag server of v0.9.7.
The problem is not only on digdag v0.9.12, but on v0.9.8 or later.
What I'd like to add is that such the problem does not happen when I run the workflow on the command line. That is, digdag run sample.dig --session "YYYY-MM-DD"
This simple task failure occurs only when you run on the web-ui or the task is kicked by the scheduler.
I found out the corresponding error messages, as follows,
Jun 1 12:04:00 ip-10-130-66-122 java: 2017-06-01 12:04:00 +0900 [INFO] (0067@[0:sample]+sample+step1) io.digdag.core.agent.OperatorManager: sh>: scripts/step1.sh
Jun 1 12:04:00 ip-10-130-66-122 java: /bin/sh: line 1: scripts/step1.sh: No such file or directory
Jun 1 12:04:00 ip-10-130-66-122 java: 2017-06-01 12:04:00 +0900 [ERROR] (0067@[0:sample]+sample+step1) io.digdag.core.agent.OperatorManager: Task failed with unexpected error: Command failed with code 127
Jun 1 12:04:00 ip-10-130-66-122 java: java.lang.RuntimeException: Command failed with code 127
Jun 1 12:04:00 ip-10-130-66-122 java: at io.digdag.standards.operator.ShOperatorFactory$ShOperator.runTask(ShOperatorFactory.java:143)
Jun 1 12:04:00 ip-10-130-66-122 java: at io.digdag.util.BaseOperator.run(BaseOperator.java:35)
Jun 1 12:04:00 ip-10-130-66-122 java: at io.digdag.core.agent.OperatorManager.callExecutor(OperatorManager.java:312)
Jun 1 12:04:00 ip-10-130-66-122 java: at io.digdag.core.agent.OperatorManager.runWithWorkspace(OperatorManager.java:254)
Jun 1 12:04:00 ip-10-130-66-122 java: at io.digdag.core.agent.OperatorManager.lambda$runWithHeartbeat$2(OperatorManager.java:137)
Jun 1 12:04:00 ip-10-130-66-122 java: at io.digdag.core.agent.LocalWorkspaceManager.withExtractedArchive(LocalWorkspaceManager.java:25)
Jun 1 12:04:00 ip-10-130-66-122 java: at io.digdag.core.agent.OperatorManager.runWithHeartbeat(OperatorManager.java:135)
Jun 1 12:04:00 ip-10-130-66-122 java: at io.digdag.core.agent.OperatorManager.run(OperatorManager.java:119)
Jun 1 12:04:00 ip-10-130-66-122 java: at io.digdag.core.agent.MultiThreadAgent.lambda$null$0(MultiThreadAgent.java:127)
Jun 1 12:04:00 ip-10-130-66-122 java: at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
Jun 1 12:04:00 ip-10-130-66-122 java: at java.util.concurrent.FutureTask.run(FutureTask.java:266)
Jun 1 12:04:00 ip-10-130-66-122 java: at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
Jun 1 12:04:00 ip-10-130-66-122 java: at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
Jun 1 12:04:00 ip-10-130-66-122 java: at java.lang.Thread.run(Thread.java:748)
It seems to me that external shell script files packed together with the workflow file may not be retrieved from the revision_archives table, or not be put on the supposed directory.
Hmm... I can't reproduce this issue.
@takkeybook Could you give us more detailed information?
find . -exec ls -l {} +
under the project root)@komamitsu Here is detailed infos.
$ find . -ls
42237461 0 drwxr-xr-x 4 kagimoto_takashi drecom 68 Jun 1 19:23 .
42237463 4 -rw-r--r-- 1 kagimoto_takashi drecom 31 Jun 1 19:22 ./.gitignore
50563429 0 drwxr-xr-x 2 kagimoto_takashi drecom 36 Jun 1 19:22 ./scripts
42230011 4 -rwxr-xr-x 1 kagimoto_takashi drecom 78 Jun 1 19:22 ./scripts/step1.sh
42230014 4 -rwxr-xr-x 1 kagimoto_takashi drecom 78 Jun 1 19:22 ./scripts/step2.sh
42230026 4 -rw-r--r-- 1 kagimoto_takashi drecom 129 Jun 1 19:23 ./sample.dig
92433853 0 drwxr-xr-x 4 kagimoto_takashi drecom 29 Jun 1 19:24 ./.digdag
$ digdag run sample.dig --session "2017-06-01"
2017-06-01 19:29:29 +0900: Digdag v0.9.12
2017-06-01 19:29:30 +0900 [INFO] (main): Using session /home/kagimoto_takashi/digdag/sample/.digdag/status/20170601T000000+0900.
2017-06-01 19:29:30 +0900 [INFO] (main): Starting a new session project id=1 workflow name=sample session_time=2017-06-01T00:00:00+09:00
2017-06-01 19:29:31 +0900 [INFO] (0016@[0:default]+sample+prepare): echo>: start 2017-06-01T00:00:00+09:00
start 2017-06-01T00:00:00+09:00
2017-06-01 19:29:32 +0900 [INFO] (0016@[0:default]+sample+step1): sh>: scripts/step1.sh
job 1 start
Thu Jun 1 19:29:32 JST 2017
kagimoto_takashi
job 1 end
2017-06-01 19:29:42 +0900 [INFO] (0016@[0:default]+sample+step2): sh>: scripts/step2.sh
job 2 start
Thu Jun 1 19:29:42 JST 2017
kagimoto_takashi
job 2 end
Success. Task state is saved at /home/kagimoto_takashi/digdag/sample/.digdag/status/20170601T000000+0900 directory.
Like as the above, I cannot encounter the trouble in the case that I use Digdag cli.
Environment that I run Digdag server
options that I specify to run Digdag server
-n 65432 -b 0.0.0.0 -O /var/log/digdag-server/task -A /var/log/digdag-server/access -c /etc/digdag/postgresql.properties --log-level trace
@takkeybook Thanks for the information. But the issue isn't still reproduced on my side...
Could you try this command and tell me what the archived project includes?
$ digdag download (the project name)
@komamitsu I really appreciate your help.
I try the command as you said, and get the following results. Necessary files to execute the tasks are retrieved.
cd WORKDIR
digdag download sample
cd sample
find . -ls
92563817 0 drwxr-xr-x 3 kagimoto_takashi drecom 37 6月 2 15:35 .
101807045 0 drwxr-xr-x 2 kagimoto_takashi drecom 36 6月 2 15:35 ./scripts
101807046 4 -rwxr-xr-x 1 kagimoto_takashi drecom 78 6月 2 15:35 ./scripts/step1.sh
101807047 4 -rwxr-xr-x 1 kagimoto_takashi drecom 78 6月 2 15:35 ./scripts/step2.sh
92563819 4 -rw-r--r-- 1 kagimoto_takashi drecom 129 6月 2 15:35 ./sample.dig
Hmmm....What's going on in my environment..
@takkeybook why don't you try this workflow to debug?:
+pwd:
sh>: ""
shell: ["pwd"]
+ls:
sh>: "scripts"
shell: ["ls", "-l"]
@frsyuki After creating a workflow file as you mentioned, I got slightly wierd results as follows by executing it with Digdag CLI (not on my server),
2017-06-02 18:22:37 +0900: Digdag v0.9.12
2017-06-02 18:22:39 +0900 [INFO] (main): Using session /home/kagimoto_takashi/digdag/debugShell/.digdag/status/20170602T000000+0900.
2017-06-02 18:22:39 +0900 [INFO] (main): Starting a new session project id=1 workflow name=debugShell session_time=2017-06-02T00:00:00+09:00
2017-06-02 18:22:39 +0900 [WARN] (0016@[0:default]+debugShell+pwd): Skipped
2017-06-02 18:22:39 +0900 [WARN] (0016@[0:default]+debugShell+ls): Skipped
Success. Task state is saved at /home/kagimoto_takashi/digdag/debugShell/.digdag/status/20170602T000000+0900 directory.
2017-06-02 18:22:42 +0900: Digdag v0.9.12
2017-06-02 18:22:44 +0900 [INFO] (main): Using session /home/kagimoto_takashi/digdag/debugShell/.digdag/status/20170602T000000+0900.
2017-06-02 18:22:44 +0900 [INFO] (main): Starting a new session project id=1 workflow name=debugShell session_time=2017-06-02T00:00:00+09:00
2017-06-02 18:22:45 +0900 [INFO] (0016@[0:default]+debugShell+pwd): sh>:
/home/kagimoto_takashi/digdag/debugShell
2017-06-02 18:22:45 +0900 [INFO] (0016@[0:default]+debugShell+ls): sh>: scripts
2017-06-02 18:22:45 +0900 [ERROR] (0016@[0:default]+debugShell+ls): Task failed with unexpected error: java.io.IOException: Broken pipe
java.lang.RuntimeException: java.io.IOException: Broken pipe
at com.google.common.base.Throwables.propagate(Throwables.java:160)
at io.digdag.standards.operator.ShOperatorFactory$ShOperator.runTask(ShOperatorFactory.java:139)
at io.digdag.util.BaseOperator.run(BaseOperator.java:35)
at io.digdag.core.agent.OperatorManager.callExecutor(OperatorManager.java:312)
at io.digdag.cli.Run$OperatorManagerWithSkip.callExecutor(Run.java:686)
at io.digdag.core.agent.OperatorManager.runWithWorkspace(OperatorManager.java:254)
at io.digdag.core.agent.OperatorManager.lambda$runWithHeartbeat$2(OperatorManager.java:137)
at io.digdag.core.agent.LocalWorkspaceManager.withExtractedArchive(LocalWorkspaceManager.java:25)
at io.digdag.core.agent.OperatorManager.runWithHeartbeat(OperatorManager.java:135)
at io.digdag.core.agent.OperatorManager.run(OperatorManager.java:119)
at io.digdag.cli.Run$OperatorManagerWithSkip.run(Run.java:668)
at io.digdag.core.agent.MultiThreadAgent.lambda$null$0(MultiThreadAgent.java:127)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:748)
Caused by: java.io.IOException: Broken pipe
at java.io.FileOutputStream.writeBytes(Native Method)
at java.io.FileOutputStream.write(FileOutputStream.java:326)
at java.io.BufferedOutputStream.flushBuffer(BufferedOutputStream.java:82)
at java.io.BufferedOutputStream.flush(BufferedOutputStream.java:140)
at java.io.FilterOutputStream.close(FilterOutputStream.java:158)
at sun.nio.cs.StreamEncoder.implClose(StreamEncoder.java:320)
at sun.nio.cs.StreamEncoder.close(StreamEncoder.java:149)
at java.io.OutputStreamWriter.close(OutputStreamWriter.java:233)
at java.io.BufferedWriter.close(BufferedWriter.java:266)
at io.digdag.standards.operator.ShOperatorFactory$ShOperator.runTask(ShOperatorFactory.java:131)
... 15 common frames omitted
2017-06-02 18:22:46 +0900 [INFO] (0016@[0:default]+debugShell^failure-alert): type: notify
error:
* +debugShell+ls:
Broken pipe (runtime)
Task state is saved at /home/kagimoto_takashi/digdag/debugShell/.digdag/status/20170602T000000+0900 directory.
I encounter the latter one mostly, but the former once in approximately ten times.
@komamitsu @frsyuki According to Furuhashi-san's yesterday talk, I possibly figure out what's going on not in your environment, but in mine. To tell the truth, I ran digdag server and digdag scheduler simultaneously. I really misunderstood a function of digdag scheduler. Now I stop the scheduler and then I successfuly execute all tasks described by bash script that I wrote in the above as a sample.
If what I found out is correct, I will close this issue.
The incident as I wrote in the subject suddenly happens after updating digdag to the latest one using "digdag selfupdate" command. When I use digdag of v0.9.7, it does not happen. There was not error logs, thus I'm not really sure what's going on.
A sample of both workflow file and shell script kicked from it is as follows,
sample.dig
scripts/step1.sh