Closed iguberman closed 4 years ago
How about an additional declaration statement:
declare inline : join-files gen-xx-sequence;
This way, task definitions and the details how to execute them are separated and one can see at a glance what is inlined and what is not. Also, you could easily come up with more annotation types and still leave the workflow script uncluttered.
Sounds great! Especially, I agree that more things might come up and it's better to keep them separate. I could declare them right next to the task if I wanted to, like below, right? Though it is a minor detail about how to organize your code. The important thing is the actual presence of this functionality.
deftask gen-xx-sequence( <items(String)> : last ) in inline bash *{
items=`for item in \`seq 0 $last\`; do printf '%02d\n' $item; done`
}*
declare inline : gen-xx-sequence;
Placement and even order wouldn't matter, as always. And having multiple declare-inlines should also be no problem. As for the actual presence of the feature, I'll have to bring up basic Condor support in the Erlang version first.
This feature would break some of the assumptions a Cuneiform function makes:
If the script is really only tiny, (and it does not read or write to/from the distributed file system) it will still be fast enough, assuming you use the current Cuneiform scheduler. If you have a lot of these tiny scripts, then I suggest picking a larger granularity level for Cuneiform foreign functions.
Allow a task to be
inline
, meaning this task will execute immediately on the same machine without scheduling to a remote worker, for little util bash scripts, or for joins of large files that aren't worth transferring to a worker somewhere just to join, and such. Also, if there are a lot of tiny tasks, they might overwhelm the scheduler unnecessarily (In condor it's easy to implement just uselocal
universe forinline
tasks).i.e.:
OR
OR maybe?
Complete script example with two very different use cases: