Open patmagee opened 3 months ago
It does not resemble python in the same way, But I personally kind of like [sample.name scatter(sample in samples if sample == "blood")]
, because it would potentially allow the following pattern:
# Filtered scatters without reassignment!
scatter(sample in sample if sample == "blood"){
call foo
call bar
call biz
}
I like adding the filtering syntax to scatter. You could do:
Array[String] samples = ["blood", "tissue", "brain"]
scatter (sample in samples) {
if (sample == "blood") {
String sample_type = sample
}
}
But then you end up with Array[String?] sample_type = ["blood", None, None]
, whereas
scatter (sample in samples if sample == "blood") {
String sample_type = sample
}
would (presumably) yield Array[String] sample_type = ["blood"]
. It's nice to not have to then select_all
.
As for list comprehension, I'm supportive. I'd suggest dropping the parens and just using scatter
as an alias for for
. Or we could introduce for
as a language keyword that would essentially be an alias for scatter
.
List[String] sample_names = [sample.name scatter sample in samples if sample.type == "blood"]
or
List[String] sample_names = [sample.name for sample in samples if sample.type == "blood"]
I think the scatter
keyword carries a connotation of the computation being distributed (even though that's not specified as a requirement of scatter
- the engine has the option of executing it locally). So it might be a good idea to add the for
keyword with the explicit requirement that for
expressions are evaluated locally.
@jdidion although the implication with scatter is that the computations WITHIN the scatter are executed remotely but not the scatter itself
If this filtering functionality gets added to the scatter, is there really still a point in adding a separate list comprehension?
scatter(sample in samples if sample.include){String sample_names = sample.name}
and
Array[String] sample_names = [sample.name for sample in samples if sample.include]
would have the same effect.
Don't get me wrong, I like list comprehensions, but in this case it seems like it's just adding more syntax for people to learn. While the same effect can be achieved with already existing (except for the filtering) and equally concise syntax.
Regarding the distinction between scatter
and for
: I my mind, scatter
implies simultaneous/parallel execution (regardless of whether that is done locally or distributed), whilst for
implies consecutive execution (ie. the tradition programming for-loop). So every "iteration" of a scatter
is independent and disconnected from every other "iteration". While an iteration of a for-loop may be impacted by the previous iterations.
@DavyCats good question and I would say they are quite distinct, but the utility of the list comprehension is far greater.
Scatters are confined to workflows and introduce an inner scope and any variables set there will be available in the outer scope
List comprehension on the other hand
What I just realized is that we already have an implicit iterator foo in bar
. So instead of adding new ways to refer to the same operation (ie a for
keyword)maybe we just extend that
foo in bar if foo > 3: foo * 2
This can be the base form that fits really well into an existing scatter
scatter(foo in bar if foo > 3: foo * 2) {
...
}
Then you can use this in an expression directly
Array[File] files
Array[String] contents = [file in files if size(file, "mb") < 5: read_string(file)]
@patmagee this could work. I'm not sure about :
(which we don't use anywhere else in wdl
) vs. {}
. E.g.
Array[String] contents = [file in files if size(file, "mb") < 5 { read_string(file) }]
Another thing to consider is an implicit iterator variable, e.g. it
in Groovy. We already have input
as a keyword, so we could repurpose that as an iterator variable:
Array[String] contents = [read_string(input) in files if size(file, "mb") < 5]
I wonder if it's better to just bite the bullet and introduce for
as a keyword. Having sequential execution has been something that has been requested anyway so it could be a small step towards actual for loops
I feel like without introducing a more complex concept like map
we are pigeonholed into introducing for
, only for the sake of simplicity and consistency with user expectation
Array[String] = [ read_string(input) for file in files if size(file, "mb") < 5) ]
I am on the fence of whether we should add a scatter form at all, or keep the List comprehension constrained to the expression. While the fact that we can use it right in a scatter looks nice
# Simple form
scatter(foo in bar if foo > 3){
}
# mutate
scatter(foo * 2 for foo in bar if foo > 3){
}
It inevitably opens the door to a for
scatter, that kind of looks weird
for (foo for foo in bar){
}
So it may actually be simpler to just use list comprehension inline. Although there is also the recursion issue again: x in .. b in ..
scatter(foo in [b * 2 for b in bar if b > 3]){
}
I agree with your last point - not necessary to change scatter
syntax if we have list comprehension.
A common pattern that has evolved in WDL is the need to convert a list of type A into a list of type B. Most programming languages support list comprehension to accomadate this, but WDL has no built in method for it. Coincidentally we have already implemented a similar pattern with ternary operators (
if-then-else
) but have not gone as far as list comprehensionCurrently, you can fake list comprehension using a scatter
While this is technically a form of list comprehension, it has a few drawbacks
Support List Comprehension Directly
The simplest strategy we can take is to support list comprehension directly within WDL, potentially following the Python syntax
One unintended consequence (for better or worse) of the above example is that it adds
for
as a reserved KW to the specification, without introducing afor
construct. I think this is an acceptable tradeoff, but if we wanted to keep with existing semantics we could do somethingAn additional benefit to list comprehension is the ability to easily add filtering