Open ccshan opened 7 years ago
This is actually somewhat tricky. But definitely worth doing.
Seems related to #59, which has been open for quite some time.
It is 'morally' related to #59, but in terms of implementation, these two are entirely disjoint.
The question I have about this issue is whether the desired semantics are to check that the index is valid (and produce the most specific type known about that index), or to assume it is valid (and to not examine the index at all, and to produce the type which holds for every element of the array).
Based on the example, I think the desired output is
> kb2 := KB(Constrain(idx(xs, 0) < 10), Introduce(xs, HArray(HInt(Bound(`>=`, 0)))))
> getType(kb2, idx(xs,0));
HInt(Bound(`<`, 10), Bound(`>=`, 0))
(that is, the former of the above two options). If that's the case, is it still correct to take the latter option when no information is known about the index (e.g. idx(xs, i)
), or should that be an error?
I believe your guess is correct, but we'll need to wait for @ccshan to chime in. My guess would be that in all cases you return the best information you can (including knowing nothing) rather than returning an error.
My guess would be that in all cases you return the best information you can (including knowing nothing) rather than returning an error.
To elaborate on my question, consider the (trivial) example:
> KB:-getType(KB:-empty, x)
FAIL
(note: when I said "produces an error", I should have said "returns FAIL"). Should the new semantics produce FAIL
for idx(xs, i)
if i
isn't present in the KB, or should it assume that i
is indeed a valid index (which simply hasn't been included in the KB, perhaps because the caller knows there is no interesting information about i
) and to return the "best information possible", i.e. what is known about every element of xs
? Or in other words, which behaviour for idx
is more consistent with the behaviour for variables?
Given that we currently have no way in (concrete) Hakaru to express an array full of bounded nat
s, it is essential that this enhancement to getType
take into account all the information in the KB about that particular array index, not just a generic element of the array. So as @yuriy0 guessed, the desired output is indeed
HInt(Bound(`<`, 10), Bound(`>=`, 0))
(except <=
would be nice... it seems that https://github.com/hakaru-dev/hakaru/blob/master/maple/KB.mpl#L348 should be enhanced too...).
Moreover (contrary to "the former of the above two options"), it would be nice if we don't check that the index is valid, even as we use the index for information.
So why not check if an index is valid?
In some sense, I see several cases:
If we had the invariant "by the time that an index is dereferenced, then it is always valid" [i.e. in the operational semantics], then we could certainly make 'nice' assumptions. But is that invariant actually true? If it's not, then I think we're introducing a very sneaky source of bugs.
Here's my rough reason to not check if an index is valid: it's natural for a programmer to write something like fn xs array(prob): fn i nat: ...xs[i]...
, plan to only pass indices i
that are in range for the array xs
, but not explicitly check or guard for i < size(xs)
. So if it's extra work for the programmer to add the guard i < size(xs)
and it's extra work for us to check if an index is valid, let's cancel out the extra work on both sides.
That is explicitly why I spoke of invariants. I should have added the word 'global' in there, I guess.
Down this road lies much pain. This is the road that led CASes like Maple to always turn x/x
into 1 and x -x
into 0. x-x
being 0 seems fine until you realize that x could be a matrix, and now 0
isn't polymorphic enough. x/x
= 1 seems fine, until you substitute in 0, or infinity -- or a function!
But I guess we can always say that Hakaru works along the lines of "Well, if you're too lazy to say what you mean, we'll guess, and sometimes we'll be wrong. But we will give you no indication whatsoever what guess we made, or even that we made a guess." That is certainly very consistent with Maple.
PS: it is also very natural for programmers to write junk. Many current mainstream programming languages build that attitude right in. Program transformations (which are kinds of symbolic computations) have this way of making this much worse, unlike just running the program, which somehow does not 'blow up' junk nearly as badly.
In other words: I don't buy arguments of the kind "it's very natural for programmers to do X". That kind of entropy gives us Go as somehow being 'modern'. The whole point of using Haskell (for example) is to get away from the kinds of mistakes too easily done by this very kind of thinking!!!
In the case of fn xs array(prob): fn i nat: ...xs[i]...
, by the time we arrive at xs[i]
, we should have a KB which looks something like
KB( Introduce(i, HInt(Bound(`>=`,0))), Introduce(xs, HArray(HReal(Bound(`>=`,0), Bound (`<=`,1)))) )
in which case, I think getType(kb, idx(xs,i))
can produce HReal(Bound(`>=`,0), Bound (`<=`,1)))
, since i
is introduced in the KB, even though there is no guarantee that idx(xs,i)
is a valid index (without even trying to check that i
could be a valid index, which it could be). I think it would be very hard in general to try to validate the input program at this stage. Also, there are many sensible programs which, for some instantiation of their variables, produce an error at runtime, but a user would very much like to write anyways. E.g. adding a guard to check i < size(xs)
would incur an additional operational cost that may never serve useful, and the user might want an error to happen if i >= size(xs)
(but there is no way to write that error-throwing program explicitly).
A stronger type system in Hakaru could help users express invariants about their code without risking throwing any errors, as well as help machines processing that code understand that the user intends to use their program in only particular ways. e.g. the above could have the stronger (currently inexpressible) type fn k nat: fn xs vec(k, prob): fn i nat(< k): ...xs[i]...
. (But this would be a huge undertaking!)
The above example is in contrast to getType( KB( Introduce( xs, HArray(..) ) ), idx(xs, i) )
, in which i
is not bound at all. Here we should FAIL
, because this is very likely a programmer error (that is, a programmer who wrote part of the Maple code) as opposed to a user error (that is, a programmer who wrote the input Hakaru program).
I was definitely not thinking of these as types, to be dealt with in a 'type system'. Guards, sure. Annotations even.
It is not essential to me that we do not check that the index is valid. What's essential to me is what I said to be essential above. Regarding that, @yuriy0, there's no way to get an upper bound 1 because the Hakaru type prob
only means a lower bound 0, not an upper bound 1.
The handy function
KB:-getType
is used at https://github.com/hakaru-dev/hakaru/blob/27de9787321139cdb5ebb6b25101b149745a9a43/maple/Summary.mpl#L215 to see if an expression is a bounded integer. This only works if the expression is a variable. What's the best way to do the same for other expressions, in particularidx
expressions?More concretely, this works:
And it would be nice if this were to work too: