Closed brendanzab closed 4 years ago
A simple example is indexing an array with [i-1], does this imply a constraint that i > 0 that we check in advance, or do we just wait and to the bounds check just before indexing and abort if it fails.
It would be super nice if in the future we could cull lots of bounds checks at compile time. Our type system should be expressive enough to allow that, the tricky thing is making sure that we don't inadvertently throw safety out the door along the way.
Interesting references in the Puffs language's related work document.
Keywords mentioned in our meeting:
Reposting @markbrown's comment from email:
Here's some formal rules for bounds checking of integers. First, some preliminaries:
Expressions
e ::= x Variable | k Constant | -e Negate | e1 + e2 Plus | e1 - e2 Minus | e1 * e2 Times | e1 / e2 Divide
Environments map each variable to a range
Γ ::= . | x : l .. u, Γ (l ≤ u)
The following judgement says that in the environment
Γ
, the expressione
has a value betweenl
andu
inclusive.Γ ⊢ e : l .. u
The
Var
rule looks up the range in the environment. The Const rule says that constantk
has rangek .. k
.x : l .. u ∈ Γ ---------------- (Var) Γ ⊢ x : l .. u -------------- (Const) Γ ⊢ k : k .. k
For negation, the bounds are negated and also swapped. For addition we simply add the corresponding bounds. Subtraction is just a combination of these two rules.
Γ ⊢ e : l .. u ----------------- (Neg) Γ ⊢ -e : -u .. -l Γ ⊢ e1 : l1 .. u1 Γ ⊢ e2 : l2 .. u2 -------------------------------------- (Plus) Γ ⊢ e1 + e2 : l1 + l2 .. u1 + u2 Γ ⊢ e1 : l1 .. u1 Γ ⊢ e2 : l2 .. u2 -------------------------------------- (Minus) Γ ⊢ e1 - e2 : l1 - u2 .. u1 - l2
This is enough to show that, if
i
has the range1 .. 10
theni - 1
has the range0 .. 9
, and can safely be used as an array index without checking the lower bound.So far it should be pretty easy to see that the above rules are correct*.
Times
is a bit harder to explain, but if you draw theX-Y
plane and sketch out various rectangles representing ranges onX
andY
, you should be able to see that the maximum and minimum values of the product are always at corner points. So the easiest way to calculate the bounds is to calculate the four products corresponding to the corners of the rectangle, the find the minimum and maximum of that set. Viz:Γ ⊢ e1 : l1 .. u1 Γ ⊢ e2 : l2 .. u2 Σ = {l1 * l2, l1 * u2, u1 * l2, u1 * u2} ---------------------------------------- (Times) Γ ⊢ e1 * e2 : min Σ .. max Σ
Hopefully you get the idea, so I'm going to leave division as an exercise. Note that you need to carefully define its behaviour for negative inputs. Also note that you don't need to worry about the bound in the case that the divisor is zero, since in this case the program will abort anyway. So, for example, if
X
andY
are in the range0 .. 10
thenX / Y
will be in the range0 .. 10
, not0 .. +inf
.We also add a
Sub
rule so that an expression can be used where a larger range is expected, such as in an array index.Γ ⊢ e : l .. u l' ≤ l u ≤ u' ---------------------------------- (Sub) Γ ⊢ e : l' .. u'
In the
Sub
rule the Curry-Howard isomorphism comes into play, in that we get bits of program to correspond with our bits of proof. The thing is, in the target language the smaller range might fit into a byte, whereas the larger range might need to be larger, or signed. So we need a cast in the generated code. Essentially the parts of the proof where we useSub
will correspond with parts of the program where we might need a cast.In a similar vein, we could add one last rule for implicit coercions. This is like
Sub
, except that the range becomes smaller instead of larger. Also likeSub
, it would correspond to something in the generated program, except in this case the code would be a bounds check, not a cast. (But like it was said in the meeting, maybe we should require explicit checks at first.)Γ ⊢ e : l .. u l ≤ l' u' ≤ u ---------------------------------- (Coerce) Γ ⊢ e : l' .. u'
It's worth noting that the method is really very incomplete, particularly if negative numbers and multiplication are involved. But for our application area I'm quite confident about the effectiveness we'll get from it.
Relevant Rust RFC: rust-lang/rfcs#671 in it looking at Sage was recommended by @kmcallister.
At the moment our arithmetic expression syntax is very rich, combining the following:
-x
,x + y
,x - y
,x * y
,x / y
!x
,x << y
,x >> y
,x & y
,x | y
,x ^ y
I'm still a bit unclear as to how thos will interact with the other features of the language, like array sizes and pointer offsets. Do we want to think about overflow and overflow as well?