Open orodeh opened 4 years ago
I agree that implicit int -> string and string -> int typecasts should be forbidden.
I am not so sure about the optional stuff. This is already explicit right? the select_all
and select_first
functions are there to convert optionals into defined variables.
# convert an optional file-array to a file-array
Array[File] files = ["a", "b"]
Array[File]? files2 = files
# Convert an optional int to an int
Int a = 3
Int? b = a
What is the exact problem with this code? In practice (in BioWDL at least) optional values are only used to set optional command line parameters in tasks. So the following works:
call myTask {
inputs:
my_required_input = 5
my_optional_input = 3
}
After your proposal it will have to be:
call myTask {
inputs:
my_required_input = 5
my_optional_input = Some(3)
}
I don't see what this accomplishes except that it makes WDL harder to write.
Forcing people to be explicit about their coercions does make WDL more verbose. However, it opens the door to using type inference (@cjllanwarne mentioned this a long time ago).
The WDL type system is actually quite complex. We have overloading, size
can be called in several different ways. We also have parametric polymorphism, cross
and zip
work on any type of array. On top of that, we have auto-coercions, which interact badly with the former two features. If we drop them, I -think-, we could do Mindley-Milner type inference which would make WDL less verbose.
A realistic way of doing this is support type-inference inside a task or a workflow, supplemented with explicit coercions when needed.
Implementing WDL bindings in java for wdl4j
has actually casused me to run into a heck of a lot of issues similar to what @orodeh is describing. There some very unintuitive actions which make expressions evaluatable, despite the fact that the LHS and the RHS simply do not match.
Additionally, it complicated by the fact that one could argue that with auto type coercions, as long as there is a path to the target type, you should be able to coerce any LHS to any RHS.
For example in MiniWDL Array[String]
can be coerced to String
. Whether this is a bit of syntactic sugar for their use case, or actually allowed int he spec is a bit unclear (the rules around type coercion are a bit nebulous in the spec given how important a feature it is).
Given the above one could do something like
Array[String] a = ["A","1.2"]
Int? b = 1 + a # a -> string -> int -> whole thing to ?
This really does not make any sense, and its a bit ambiguous what it will eventually resolve into. Additionally, there is no strict ruels around what is actually coercible to what... (rules were added in the development spec.. but they still are very vagues). This leads to alot of potential issues in executing workflows on different systems.
Its worth noting too that even python uses explicit type conversions through functions like int(x)
,float(x)
, str(x)
etc.
Coercion semantics usually have another requirement called coherence that needs to be satisfied, which is much more difficult to prove. For reference, here one of the papers:
https://link.springer.com/chapter/10.1007%2F3-540-54415-1_70
I have not seen proofs for WDL, but I'm sure the developers must have them somewhere.
Hope it helps, ~p
@pgrosu your confidence in us is comforting, but I am not to sure you are going to find any sort of formal proofs, or documentation on coherence. If this is an area that you are comfortable in, and could help us flesh out that would be much much appreciated
How about type coercion of the inputs/outputs from the standard library functions? On the one hand - all inputs and outputs must be typed. But I do see some use cases where:
scatter (pair in zip(foo, bar)) {}
I could not find any such case in the spec, and I think this is a bit contradicting the static typing requirement. In general though seems these examples compile and run, but we start to see coercion problems when nesting such functions and applying them to optional types. Like this:
Array[Int] foo
Array[Int] bar
Array[String?] lol
scatter (nested_pair in zip(zip(foo, bar), lol)) {}
In this case we see the problems of coercion for values of lol
which were evaluated to null
, throwing exceptions at runtime. If we declare and type the outputs - all fine, no exceptions:
Array[Pair[Pair[Int, Int], String?]] arr = zip(zip(foo, bar), lol)
scatter (nested_pair in arr) {}
We can solve this in dxCompiler and allow such scenarios, but isn't this in the realm of automatic type coercion and contradictory to strong/static typing of the language?
Could you provide more explicit guidance on typing the inputs/outputs for the standard library functions in the language spec?
Automatic coercions considered harmful
Authors: O. Rodeh with J. Didion (DNAnexus Inc.)
This issue describes a line of thought regarding type checking for WDL programs. It is based on experience acquired while working with DNAnexus customers on their WDL pipelines. The author also wrote a WDL type checker, evaluator, and cloud executor.
Coercions
WDL makes wide ranging use of coercions to convert one type to another. This is, in general, a well accepted idea used in a wide variety of programming languages. At DNAnexus we have run into several instances where WDL workflows errored out because static type checking was too lenient. This resulted in long running expensive customer pipelines failing. The issue is twofold: the turn around time is long, and the cost can run to thousands of dollars.
For example, under the WDL type system, a
File
is auto-coercible intoFile?
. However, this is not the case in the DNAnexus simpler type-system. If you have a task that takes anArray[File?]
and call it withArray[File]
this will fail. For example:Workflow
bar
will fail when it callsfoo
, after it has spent a long time running an expensive calculation. This is frustrating for the users. We could prevent this, as well as many other errors, by performing stricter type checking. Below is an eclectic list of automatic coercions that appears in GATK pipelines. The GATK pipelines are large real world workflows that our customers expect to execute flawlessly on our platform. The code is a simplified version of the real program lines, stripping a way the context to focus on the main point.Converting between
T
andT?
occurs automaticallyConverting between strings, integer, and floats "just works".
The branches of a conditional may not resolve to the same type. Here, one side is
Array[String]
, the other isArray[String?]
.Since
T
andT?
are interchangeable, we can perform operations likeT
+T?
+T
.The following non-intuitive code type checks in WDL (womtool-51.jar) :
And then
aai
type checks too:Proposal
Explicit coercions are useful and make sense, for example:
However, auto-coercions can lead to unexpected behavior and costly/confusing job failures. We suggest allowing explicit coercions, adding a few necessary stdlib function for type conversion, and disallowing automatic coercions. A few necessary stdlib functions, written a la Scala, are:
In order to select the first non empty integer, as below:
one would write:
This would make the typing of
select_first
accurate; it would only take an array ofT?
.Code like:
would be written:
A Pythonic way of writing these functions would be to have a single cast function that takes a type and a value and either casts the value to the type or throws an error. For example:
cast could even handle optional types, which would get rid of the need for Some(), e.g.
I hope this approach would reduce the number of preventable runtime errors.