Closed gogolaylago closed 4 years ago
Have you tried set -e
at the start of script.sh?
Have you tried
set -e
at the start of script.sh?
I did, so i added
set -x
set -e
...
to print the command line while it runs and try to interrupt when it fails, no luck
So basically flintrock run-command
returns a success even if the run command fails on the cluster?
So basically
flintrock run-command
returns a success even if the run command fails on the cluster?
Correct, so the weird thing is this: when I run this command in the cluster
flintrock --config config.yaml run-command --master-only test_cluster 'spark-submit --packages org.apache.hadoop:hadoop-aws:2.7.6 my_python_script.py'
I get this error (I purposefully made a non-existing variable in my .py script called 'TODAY_STR' which wasn't defined)
Traceback (most recent call last):
File "/home/ec2-user/my_python_script.py", line 22, in <module>
print('Running files on %s' % TODAY_STR)
NameError: name 'TODAY_STR' is not defined
But when I execute the .sh file with the above command in it, nothing happens. Flintrock just went ahead to destroy the cluster instead of stopping the app:
...
++ flintrock --config config.yaml run-command --master-only test_cluster "spark-submit --packages org.apache.hadoop:hadoop-aws:2.7.6 my_python_script.py $1
ret_code=$?
if [ $ret_code -ne 0 ]; then
exit $ret_code
fi"
Running command on master only...
[54.204.180.54] Running command...
[54.204.180.54] Command complete.
run_command finished in 0:00:31.
++ flintrock --config config.yaml destroy --assume-yes test_cluster
Destroying test_cluster...
What happens if you capture your commands in a script, upload that to the cluster with copy-file
, and then execute it with run-command
?
i.e.
cat << EOM > le-script.sh
spark-submit --packages org.apache.hadoop:hadoop-aws:2.7.6 my_python_script.py $1
ret_code=$?
if [ $ret_code -ne 0 ]; then
exit $ret_code
fi
EOM
flintrock copy-file ...
flintrock run-command ... "chmod u+x le-script.sh"
flintrock run-command ... "le-script.sh" # should correctly reflect exit code of le-script.sh
I'm wondering if the string of commands jammed into a single string is somehow suppressing the return value.
What happens if you capture your commands in a script, upload that to the cluster with
copy-file
, and then execute it withrun-command
?i.e.
cat << EOM > le-script.sh spark-submit --packages org.apache.hadoop:hadoop-aws:2.7.6 my_python_script.py $1 ret_code=$? if [ $ret_code -ne 0 ]; then exit $ret_code fi EOM flintrock copy-file ... flintrock run-command ... "chmod u+x le-script.sh" flintrock run-command ... "le-script.sh" # should correctly reflect exit code of le-script.sh
I'm wondering if the string of commands jammed into a single string is somehow suppressing the return value.
I think you might be right. I did what you suggested, the exit code was successfully shown. So I removed this part from "le-script.sh"
> ret_code=$?
> if [ $ret_code -ne 0 ]; then
> exit $ret_code
> fi
and then ran flintrock run-command ... "le-script.sh"
, the script was successfully interrupted. I wonder what about the above lines are colliding with the command though, hmmmm.
Anyways, thank you!!
I wonder what about the above lines are colliding with the command though, hmmmm.
It's probably something about how shell return codes are interpreted for a single command vs. a serious of commands. Perhaps some fiddling with sub-shells or other Bash/shell constructs might give you more of the behavior you're looking for, but I'm not sure.
e.g.
flintrock --config config.yaml run-command --master-only test_cluster "(
spark-submit --packages org.apache.hadoop:hadoop-aws:2.7.6 my_python_script.py $1
ret_code=$?
if [ $ret_code -ne 0 ]; then
exit $ret_code
fi
)"
(Note the enclosing parentheses.)
But certainly, the most reliable/understandable approach is to copy up a script to the cluster and execute it there, making sure to include set -e
in the script.
So I don't know if I'm being dense, but I can't figure out how to run flintrock in bash script and catch errors. This is my
script.sh
file:$ cat script.sh
Even when an error occured in
my_python_script.py
, the .sh script will exit as if nothing happened. What should I do?