Open tiborsimko opened 6 years ago
After investigating the yaml
standard regarding multi-line strings, I have found the three ways in which we can support multi-line commands:
>
syntax (block scalar, folded style):workflow:
type: serial
specification:
steps:
- environment: 'python:2.7'
commands:
- >
echo "Running ${helloworld}." &&
python "${helloworld}"
--sleeptime ${sleeptime}
--inputfile "${inputfile}"
--outputfile "${outputfile}"
Available to try at https://github.com/reanahub/reana-demo-helloworld/pull/32/commits/07c8fdb8d1d328563a4abf7dcf424591ead6420d.
Potential source of errors with this approach: it took me a while to realise that the >
syntax was not working because, as the standard states, there is no line folding (allows long lines to be broken for readability) when the indentation of the different lines in the multi-line string is different, so next example wouldn't work (more info here):
workflow:
type: serial
specification:
steps:
- environment: 'python:2.7'
commands:
- >
echo "Running ${helloworld}." &&
python "${helloworld}"
- --sleeptime ${sleeptime}
- --inputfile "${inputfile}"
- --outputfile "${outputfile}"
+ --sleeptime ${sleeptime}
+ --inputfile "${inputfile}"
+ --outputfile "${outputfile}"
This is how the command looks like in the container:
$ kubectl get -o yaml pod bc381d0e-7ca6-43dd-87cb-2a02d0758a45-4dgp9
...
- command:
- bash
- -c
- "cd /reana/users/00000000-0000-0000-0000-000000000000/workflows/03d7521d-e606-4d48-b9d7-4a9e42ad0e15
; echo \"Running code/helloworld.py.\" && python \"code/helloworld.py\" --sleeptime
2 --inputfile \"inputs/names.txt\" --outputfile \"outputs/greetings.txt\"\n "
...
|
syntax (block scalar, literal style):workflow:
type: serial
specification:
steps:
- environment: 'python:2.7'
commands:
- |
echo "Running ${helloworld}."
python "${helloworld}" --sleeptime ${sleeptime} \
--inputfile "${inputfile}" \
--outputfile "${outputfile}"
Available to try at https://github.com/reanahub/reana-demo-helloworld/pull/32/commits/3fdcc4717d709d5feff1dc1bdac96c6362fb3946.
It is a more close approach to Dockerfiles' command syntax.
This is how it ends up looking inside the container:
$ kubectl get -o yaml pod 3a774fc3-5a83-4305-be68-edf93382e78d-wv579
...
- command:
- bash
- -c
- "cd /reana/users/00000000-0000-0000-0000-000000000000/workflows/6fb6fc46-a8d9-46b2-9bb6-6875e9537833
; echo \"Running code/helloworld.py.\"\npython \"code/helloworld.py\" --sleeptime
2 \\\n --inputfile \"inputs/names.txt\" \\\n --outputfile
\"outputs/greetings.txt\"\n "
...
workflow:
type: serial
specification:
steps:
- environment: 'python:2.7'
commands:
- echo "Running ${helloworld}." &&
python "${helloworld}" --sleeptime ${sleeptime}
--inputfile "${inputfile}"
--outputfile "${outputfile}"
Available to try at https://github.com/reanahub/reana-demo-helloworld/pull/32/commits/9bce4936f94e7f706d87fddb92eec2c2694b34f9.
This approach is the less powerful since it has a lot of limitations, due to ambiguity reasons many characters would be forbidden. There is also the possibility to enclose the whole string in double or single quotes, plus escaping all forbidden characters inside the string (more info here).
$ kubectl get -o yaml pod a8f3a74c-8e1b-4266-8311-9b64e0f31120-4mdbl
...
- command:
- bash
- -c
- 'cd /reana/users/00000000-0000-0000-0000-000000000000/workflows/b953b28f-b7f2-44fa-a60c-8464fd65ad45
; echo "Running code/helloworld.py." && python "code/helloworld.py" --sleeptime
2 --inputfile "inputs/names.txt" --outputfile "outputs/greetings.txt" '
...
As a conclusion, I think we should definitely go for a block scalar because option 3 will potentially end up being messy with escaped characters. Regarding block scalars, I would choose the literal style (option 2) since the problem with the indentation for the folded style (option 1) will definitely end up creating problems for users. Moreover, the standard recommends literal for code blocks.
cc'ing @reanahub/developers since this directly affects users.
@diegodelemos Nice summary; I also prefer the option number 2 where the use of backslashes seems rather intuitive. (E.g. Travis CI does the same in multiline conditions https://docs.travis-ci.com/user/conditions-v1#line-continuation-multiline-conditions.)
However dunno about the "visual non-splitting" of the echo and python commands in your second example; e.g. see its JSON representation:
$ yaml2json reana.yaml | jq -S '.workflow.specification.steps'
[
{
"commands": [
"echo \"Running ${helloworld}.\"\npython \"${helloworld}\" --sleeptime ${sleeptime} \\\n --inputfile \"${inputfile}\" \\\n --outputfile \"${outputfile}\"\n"
],
"environment": "python:2.7"
}
]
The notion that the commands are multiple is lost there. Would be nice if commands
were a list.
Seeing
- command1 arg11 arg12
command2 arg21 arg22 arg23 \
arg24 arg25
people might treat it as:
- command1 arg11 arg12 && \
command2 arg21 arg22 arg23 \
arg24 arg25
Consider something long as:
- command1 arg11 arg12
- command2 arg21 arg22 arg23 \
arg24 arg25
- command3 arg31 arg32
- command4 arg41 arg42 arg43 \
arg44 arg45
- command5 arg51
...
Currently we have in
reana.yaml
long instructions like:located in one single line.
It would be useful to accept mult-iline formats such as:
for better readability.
A quick experiment with YAML's standard '>' technique to allow for newlines did not work; see https://github.com/reanahub/reana-demo-worldpopulation/pull/22#discussion_r213982149.
Investigate this.