Closed odersky closed 2 years ago
Here's one problematic case:
class Inv[T](val elem: T)
object Test {
def unwrap[Outer](inv: Inv[Outer]): Outer = inv.elem
def wrap[Inner](i: Inner): Inv[Inner] = new Inv(i)
val a = unwrap({
class Local
val local = new Local
wrap(local)
})
}
The inferred type for a
is Local
, even though it is out of scope at that point. There's at least two reasons the avoidance doesn't work here currently:
https://github.com/lampepfl/dotty/blob/50df4d236bde1f883f07e6bf279320aba7711135/compiler/src/dotty/tools/dotc/core/Types.scala#L4174-L4177
TermRef(NoPrefix, _)
but Local
is a TypeRef(NoPrefix, Local)
, this is easy to fix.val a
, so the nesting level check doesn't prevent the type variable of unwrap
from being instantiated to Local
. This one looks trickier to fix without rethinking how we compute nesting levels, suggestions welcome.(I'm also growing skeptical about the usage of nesting levels in general: because nesting forms a tree, the fact that a symbol has a smaller nesting level does not necessarily mean it's in scope currently, it might just be in a different branch of the nesting tree, but I haven't tried to write a proof of concept for this yet).
Welp, this is going to be trickier than I thought. My plan was to run avoidance on all our open constraints whenever some symbols went out of scope, but I now realize that this isn't enough: these local symbols can have bad bounds, meaning we can derive absurd constraints from them in a way that we cannot recover from just using avoidance, for example the following crashes at runtime:
trait Base {
type M
}
trait A {
type M >: Int | String
}
trait B {
type M <: Int & String
}
object Test {
def foo[T](z: T, x: A & B => T): T = z
def main(args: Array[String]): Unit = {
val a = foo(1, x => (??? : x.M))
val b: String = a // ClassCastException
}
}
Because our constraints successively look like:
?T >: 1
?T >: x.M // because 1 <: x.M
?T >: Int & String // because avoidance picks the upper-bound of x.M, which happens to also be a subtype of its lower-bound
This particular example could be fixed by checking if the constraints we end up with after typing the lambda entail the constraints we had before, but I'm not convinced this is enough. More radically we could simply throw away all constraints computed locally, then constraint the result type of the lambda after avoidance to be a subtype of its expected type to recover constraints that are not tainted by reasoning-via-bad-bounds. This would also avoid the need for running avoidance on all constraints as I originally planned to do.
val y: String = x // ClassCastException
Why does this even type check? We have ?T >: 1
and ?T >: Int & String
(which is equivalent to having ?T >: Int & String | 1
). There is nothing ?T
could be consistently instantiated to that would allow for ?T <: String
, which is required on this line. Indeed, that would imply Int & String <: String
(ok) and 1 <: String
(not ok).
I tried to show that in my description of the constraints involved: We start with ?T >: 1
but then we narrow this to ?T >: x.M
and not ?T >: 1 | x.M
as one might expect because 1 <: x.M
, after avoidance this becomes ?T >: Int & String
, then instantiation picks ?T := Int & String
. The flaw here is that x.M
has bad bounds and was allowed to taint constraints defined outside of the scope where x
is visible.
Ok, I see.
Conceptually, if you were to go with a level-based approach (which is what I've been experimenting with privately), you should not allow registering a bound x.M
mentioning a local symbol x
inside an outer-level type ?T
. Instead, in such case of extrusion, you'd introduce an existential and register ?T :> exists x: X; x.M
, i.e., ?T :> X#M
, from which a lower bound can't be leveraged, according to the theory of sound type projections 😄
Anyway, I know existentials and type projections are being phased out (as opposed to fixed). This is just some intuition nuggets in case it gives you new ideas on how to solve this! I think proper type-theoretic existentials (not Scala 2 ones!) are a good way of thinking about type avoidance.
Conceptually, if you were to go with a level-based approach (which is what I've been experimenting with privately), you should not allow registering a bound x.M mentioning a local symbol x inside an outer-level type ?T.
I don't think this is enough in general, it seems to me that the mere existence of a value with bad bounds in scope means that you can taint your global constraints, even if that particular value never appear in those constraints.
For example, imagine that I locally know that x.M >: Int <: String
and that I have a global constraint ?T <: String
, then in principle I should be able to refine that constraint to ?T <: Int
. In practice this doesn't work because we don't have transitivity: if I try to register that constraint, the compiler will check Int <: String
and fail to find any evidence for that, but that means we're on really shaky foundations.
Thinking about it more, instead of introducing a new variable with bad bounds, we can also use the GADT logic to refine the bounds of an existing type, my first attempt was:
trait Base {
type M
}
trait A extends Base {
type M >: Int | String
}
trait B extends Base {
type M <: Int & String
}
enum Foo[T] {
case One extends Foo[String]
case Two(val elem: A & B) extends Foo[elem.M]
}
import Foo._
object Test {
def id[I](x: I): I = x
def foo[T](x: Foo[T], y: T) =
// I'd like to infer ?I = T but I get ?I = Any
id(x match
case One =>
1 // infer ?I >: 1
case Two(_) =>
y: T // infer ?I >: T because we're reasoning with bad bounds
)
def main(args: Array[String]): Unit =
val x: String = foo(One, "")
println(x)
}
This almost worked, we did end up with the incorrect ?I >: T
after typechecking each case, but the last thing we do in a match is to assign it a type based on the lub of the types of each case, and so we ended up with the safe ?I >: T | Int
.
Finally, I remembered that if the expected type of a match expression is a match type, we type it with that match type instead of the lub of the cases, and I managed to produce this monstrosity which compiles (with one warning about an unreachable case) and crashes with a ClassCastException:
trait Base:
type M
trait A extends Base:
type M >: Int | String
trait B extends Base:
type M <: Int & String
sealed trait Foo[T]
class One[T]() extends Foo[T]
class Two(val elem: A & B) extends One[elem.M]
object Test:
type Const[X, Y] <: Y = X match
case AnyRef => Y
case Two => Y
def id[A, I](a: A, x: Const[A, I]): I = x
def foo[T](x: Foo[T], y: T) = // The inferred result type is `T` !
id(x, x match
case _: AnyRef =>
1 // infer ?I >: 1
case _: Two =>
y: T // infer ?I >: T because we're reasoning with bad bounds
)
def main(args: Array[String]): Unit =
val x: String = foo(new One[String], "") // ClassCastException: java.lang.Integer cannot be cast to java.lang.String
println(x)
@Blaisorblade, am I correct to assume that the literature on DOT has not considered the impact of bad bounds on type inference so far?
if I try to register that constraint, the compiler will check
Int <: String
and fail to find any evidence for that, but that means we're on really shaky foundations
So in that case, the lack of transitivity would be a feature, not a bug :^)
But I see your point with GADT reasoning messing thing up in that case. If I understand your example correctly, in the Two
case, we go from ?I >: 1
to ?I >: T
because within that case, the new bound 1 | T
is (wrongly) simplified to T
, using the assumption Int <: T
that does not hold outside the case. What I would say is that there's an unhealthy mixup of constraints at different levels of reasoning here. But it seems that fixing this would require a pretty extensive change in how GADT reasoning and type simplification work, unfortunately.
If I go back to my levels idea, I'd say that T
within the Two
case, since it is locally merged with pattern.M
, should be considered at the same (nested) level as pattern
, and thus it would be guarded from mixing with 1
in the inferred bound, which would be ?I >: 1 | (exists p: Foo[T] & Two; T)
or ?I >: 1 | (exists p: Foo[T] & Two; p.M)
.
@Blaisorblade, am I correct to assume that the literature on DOT has not considered the impact of bad bounds on type inference so far?
I fear you're right; all I can think of is:
Thanks for the references @Blaisorblade!
If I understand your example correctly, [...]
Yes, that's right!
If I go back to my levels idea, I'd say that T within the Two case, since it is locally merged with pattern.M, should be considered at the same (nested) level as pattern
I'm not sure how that would work in practice: if T already appears in constraints outside of the case, then if you locally change its nesting level, your constraints aren't valid anymore. On the other hand, you could create a local T1 and use that, but then for GADT reasoning to work, you still want any local subtype check involving T
to take into account that T =:= T1
, otherwise you loose expressivity, but that should lead to the same issues mentioned above with having a general transitivity property. So while I like the level idea I'm getting more convinced the "throw local constraints away" proposal I made above is really the only thing we can do here (by the way @LPTK, thanks for taking the time to respond here, it's nice to get a different perspective on this stuff!).
I don't really know better. my work on algorithmic (sub)typing is all based on removing subtyping reflection, but indeed, how to handle subtyping relfection is an interesting problem which I would like to come back someday in the future. It is not clear to me that within D<:, whether subtyping reflection is already undecidable, but with higher kinded types as in Scala, subtyping reflection itself induces combinatory logic which is readily undecidable. it will be particularly interesting if there is a syntax-directed algorithm which can solve part of subtyping reflection.
On the other hand, you could create a local T1 and use that
Yeah, this is the kind of things I end up doing. Create a T1 >: T <: T
whose bounds are valid and can be refined further, but which is equivalent to T
.
you still want any local subtype check involving
T
to take into account thatT =:= T1
, otherwise you loose expressivity
Yes, this can be done by replacing T
by T1
every time T
occurs directly in a constraint while we're at the nested level. We know that T1
is a local alias of T
, so as long as we're within a scope where we know T1
exists, this is fine.
the "throw local constraints away" proposal I made above is really the only thing we can do here
But is this sound? I have trouble believing simply throwing away constraints is not going to create problems. For instance, what about something like f => x => { f(x); x }
. Are you going to throw away the constraint that says x
should be a valid argument for f
?
Yes, this can be done by replacing T by T1 every time T occurs directly in a constraint while we're at the nested level.
Can you really? Suppose you have an existing constraint ?R <: T
and locally you have T1 := Int
, so you can refine that constraint to ?R := Int
but that refinement is invalid outside of the current case.
or instance, what about something like f => x => { f(x); x }. Are you going to throw away the constraint that says x should be a valid argument for f?
In present day Scala this isn't an issue because arguments to lambdas are always fully instantiated before typing their body so yes :). Now if we had https://github.com/lampepfl/dotty/pull/9076 (which was already on shaky grounds) the situation would be more complicated indeed, so this might be a death knell for #9076 (or maybe we can still salvage some of it, I haven't spent much time thinking about it).
Suppose you have an existing constraint
?R <: T
and locally you haveT1 := Int
, so you can refine that constraint to?R := Int
but that refinement is invalid outside of the current case.
So, let's say we have ?R <: T
and locally we have T =:= T1 =:= Int
. We try to constraint Int <: ?R
to refine its lower bound, for instance, which should not work. Upon inserting a bound on the lower-level variable ?R
, we first need to make sure the type is at the same level (here it's already the case, because Int
is at level 0), and then I'd say we continue and propagate the bound at the level of ?R
, where T =:= T1
no longer holds.
This level update upon going through lower-level type variables makes sense if you think about the fact that { val tmp = f(x); y => ... tmp ... }
should type check similarly to { y => ... f(x) ... }
, meaning the constraints introduced by f(x)
, which are independent from y
, should not be "captured" as constraints local to the lambda. That is, unless y
somehow has an influence on the expression – if we were to conceptually elaborate { y => ... f(x) ... }
as something like { y => ... f(x: y.T) ... }
, things would be quite different.
In present day Scala this isn't an issue because arguments to lambdas are always fully instantiated before typing their body so yes :)
This is not very reassuring, though. I would not be surprised at all if there were many corner cases where things can still go sideways. It does not seem like a good idea to rest the soundness of the type system on an "accidental" type inference limitation :^)
Now if we had #9076 (which was already on shaky grounds) the situation would be more complicated indeed
Well, I think the current GADT reasoning implementation, its interactions with type inference, and the propose approach of simply discarding constraints wholesale, seem to be on vastly more shaky grounds than the simple idea of inferring lambda parameter types!
In any case, this discussion raises many interesting questions!
and then I'd say we continue and propagate the bound at the level of ?R, where T =:= T1 no longer holds.
Ah I see, so the constraint propagation for a given variable is done in a context where more deeply nested variables aren't visible basically.
Meanwhile, I've also realized that GADT constraints can propagate outwards incorrectly even without bad bounds being present /cc @abgruszecki :
sealed trait Foo[T]
class One[T]() extends Foo[T]
class Two() extends One[Int]
class Contra[-X]
object Test:
type Const[X, Y] <: Y = X match
case AnyRef => Y
case Two => Y
def id[A, I](a: A, x: Const[A, Contra[I]]): Contra[I] = x
def foo[T](x: Foo[T], y: T) = // The inferred result type is `Contra[1]` !
id(x, x match
case _: AnyRef =>
new Contra[T] // infer ?I <: T
case _: Two =>
// T := Int
new Contra[1] // Infer ?I <: 1
)
Besides the approaches we've discussed above, it's also worth mentioning that GHC solves this problem by having implication constraints (so instead of registering ?I <: 1
you'd register ?T := Int implies ?I <: Int
, see p. 35 of https://www.microsoft.com/en-us/research/wp-content/uploads/2016/02/jfp-outsidein.pdf).
Congrats, @smarter !
Reopening as we had to temporarily turn off level-checking to avoid regressions due to interactions with other compiler parts that will take a while to sort out: https://github.com/lampepfl/dotty/pull/15343
8867 fixed a tricky avoidance problem #8861 by introducing new techniques for preventing closure types from capturing variables they are not supposed to see.
There are several areas where our understanding of this issue is still incomplete. It would be good to follow this up with test cases, code fixes and possibly formalizations, as the subject matter is extremely slippery and the requirements for type inference are extremely stringent.
I left in the code base of #8867 several comments starting with
This should be followed up.