Open jpfairbanks opened 5 years ago
This recovers dimensional analysis, and the linear algebra matrix shape rules enforced by most linear algebra systems.
For example in regular julia code that uses Array{T,N}
the array type captures the information about the dimensionality, but not the sizes. When you use [StaticArrays] (http://juliaarrays.github.io/StaticArrays.jl/stable/pages/api.html) the types encode the sizes and you could statically reason about whether the code will raise a DimensionError
For example
function p(a::Matrix{T},b::Matrix{T})
return a*b + a
end
x = rand(Float64, (5,2))
y = rand(Float64, (2,2))
p(x,y)
if you define C
as the class of valid linear algebra models
, f
can be computed by running the code and making sure there were no DimensionMismatch
exceptions.
If you apply the only available swap (swap!(m, x,y)
), then you get
function p(a::Matrix{T},b::Matrix{T})
return a*b + a
end
y = rand(Float64, (5,2))
x = rand(Float64, (2,2))
p(x,y)
which errors because a*b
will be incompatible.
but if you then refine the types to use SMatrix, everything works.
function p2(a::SMatrix{T, Val{M1,N1}},b::SMatrix{T, Val{(M2,N2)}})
return a*b + a
end
y = SMatrix(rand(Float64, (5,2)))
x = SMatrix(rand(Float64, (2,2)))
p(x,y)
there is nothing to swap and so there are no ambiguities in p2
.
A basic workflow that this type of formalization can support would be:
For a given program $p_1$, identify/detect potentially ambiguous types.
true
. Note that we assume here that we're able to filter the set of candidate data types, to avoid (in most cases) primitive-level swaps.Transform $p_1$ to $p_2$ by introducing structs and/or more precise data types to resolve ambiguities.
If we iteratively apply step one to $p_2$, the ambiguities we initially detected will have been resolved, and will not be identified as candidates for type disambiguation.
This issue touches on the validation task in that we should expect some swaps to result in alterations to programs that do not halt execution, but do meaningfully alter results. Ideally, we'd be able to flag inappropriate swaps of this nature by observing cases where the results represent marked deviations from previously observed values/outputs.
One thing we are going to run into here is values that get passed to kwargs and other "optioney" things, we would have to introduce a lot of types like Val(n::Int), Natural{Int}, Probability{R} where R <: Real
in order to prevent negative numbers from going places they shouldn't
This paper gives a CT treatment of linear algebra. https://hal.inria.fr/hal-00919866/document
Going to tag this with punt because full implementation of the oracle approach we discussed won't happen for this release, but would be a natural next step for next cycle.
Scientific code is written with underspecified types. There are values of different semantic meanings that are represented with the same
DataType
in the code. For example there are often many different variables of typeVector{Float64}
where the entries in the vectors represent dramatically different concepts. In a perfect world, every different modeling concept would have its own type with semantics defined so that the compiler could check that the program "follows the rules" of how those concepts should interact. For example, Unitful creates new types for unitful quantities and enables the compiler to enforce the semantics of dimensional analysis.We discussed one heuristic for finding variables of the same type that should actually be different types. "If two variables take disjoint sets of values, then we should split that type into two more specific types." A more general form of this would be:
Let
C
be a class of models andf
be an oracle that determinesf(m) = true iff m in C
. Define a functionswap(m, a, b)
that takes the modelm
and swaps all occurances of the valuea
with the valueb
. Iff(swap(m,a,b)) = f(m)
thena
andb
are interchangeable.We want to define a program as well typed if for every pair of values
a,b
,a,b
are interchangeable if and only if they are the same type. I think this would go a long way towards making the concept of "the type system represents the semantics of the model" into a rigorous mathematical concept.