Open smarter opened 4 years ago
Can we find a single case where inferring GADT constraints from singleton type checks is actually useful? Taking object identity into account when inferring GADT constraints sounds deeply bogus to me. As we can see, it also prevents useful optimizations such as reusing a single instance of Nil
for an invariant list. The simplest fix would then be to just infer no GADT constraints from singleton type checks.
Can we find a single case where inferring GADT constraints from singleton type checks is actually useful?
I don't have a concrete example, but being able to infer constraints from equality sounds like something that'd be useful when writing theorem-prover-style code, maybe @Blaisorblade has an idea ?
it also prevents useful optimizations such as reusing a single instance of Nil
Not at all. List
is covariant so Nil
is perfectly sound as is. This is really only a problem when people rely on asInstanceOf
working because of type erasure.
Can we find a single case where inferring GADT constraints from singleton type checks is actually useful?
The typing rule makes some sense for objects with type members, tho I’m not sure I can think of a use case there; maybe when you match hlists against null? Emir et al. (2007) present another use-case (http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.88.5295&rep=rep1&type=pdf), but I’m not sure it’s very convincing...
This has been discussed a long time ago in the scalac bug tracker, (and apparently it's the reason why polymorphic, non-specialized zero-arg givens aren't lazy vals by default?..) I can't find the link now but I mentioned it in
EDIT: I don’t think the singleton type cast trick is relevant, unless DOT specifies semantics for reference equality? You MUST opt-in to insanity for it to lead to unsoundness – it can also be detected and require a langauge flag.
@smarter
There's still a cast
Surely, you could also avoid the cast by putting the appropriate private[scala]
methods in the companion objects of the opaque types, no? (Or really just putting refl
itself there.)
@AleksanderBG
Can we find a single case where inferring GADT constraints from singleton type checks is actually useful?
I'm sure there are many and we just haven't thought of them yet :^) Please do not make things worse just to avoid making them right!
Another way this could be fixed if by using erased
type =:=[-From, +To]
type <:<[-From, +To] <: =:=[From, To]
erased implicit def refl[A]: A =:= A = ???
extension ops on [From, To](erased ev: =:=[From, To]) {
def apply(x: From): To = x.asInstanceOf[To]
}
The upside (performance-wise and soundness-wise) and downside (usability-wise) is that all =:=
parameters would need to be annotated with erased
. We can also get rid of the annoying singleton instance.
Note: the collective extension do not support erased yet but it should support it.
@nicolasstucki what if people have code like this?
def test[T: <:<[Int, *]](arr: ArrayBuffer[Int]): ArrayBuffer[T] = arr.map(summon)
@LPTK what we could also do is add a @phantom
annotation to type parameters of classes which says "never infer GADT constraints from this type parameter". Then we could annotate both type parameters of =:=
with this annotation and be on our merry way, same with the Nil
of an invariant list. Additional benefit is I can see how to implement this w/o too much effort.
@AleksanderBG It seems you never want to mark only one parameter as phantom (also, I'm not sure the name is good). It's a bit annoying that this annotation is specific for singleton type tests and singleton values casted to multiple types, but I'm not sure of an easy way to generalize it.
OTOH, it seems none of the proposals can go in for 3.0 (except a compiler-hardcoded blacklist of types where GADT inference must be avoided), and I'm unsure which can go in for 3.1...
@Blaisorblade This @phantom
is basically another view at Haskell's type parameter roles. Both for us and for Haskell, "phantom" means "not reflected in the underlying runtime structure". Contrast this with (https://gitlab.haskell.org/ghc/ghc/wikis/roles):
It is often the case that programmers use type variables simply to constrain the type checker, not to make any statement about the runtime representation of a type.
It's just that we're approaching the issue from another angle, since in Haskell people wanted a safe way of coercing between values of two types, and what we want is not inferring GADT constraints from runtime instance checks on values which were "unsafely" coerced.
What exactly makes the compiler assume that it can compile this to a runtime equality check instead of producing an unchecked erasure warning (and why would an opaque type avoid it)?
Casting a single instance of an immutable and invariant class to whatever instance is required is commonplace in the standard library and in other Scala code.
What exactly makes the compiler assume that it can compile this to a runtime equality check
That's the semantics of x: foo.type
type tests according to the spec.
instead of producing an unchecked erasure warning
there's nothing that is left unchecked, so there's no warning
and why would an opaque type avoid it
opaque types are not injectives like class types, if we have Foo[X] = Foo[Y]
we cannot conclude anything about the relationship between X
and Y
, because we could have defined opaque type Foo[T] = Int
for example.
Casting a single instance of an immutable and invariant class to whatever instance is required is commonplace in the standard library and in other Scala code.
Yes, and I wonder if we could get rid of this pattern. For =:=
it seems that we can easily using an opaque type. But I don't have a good solution for HashMap.empty:
https://github.com/scala/scala/blob/8f20a7d84924939698717d26b3e0e5a5c0da829c/src/library/scala/collection/immutable/HashMap.scala#L2140-L2146
If HashMap was contravariant in K then this would be fine (as we couldn't infer any kind of equality between types between two instances) but I assume there's a good reason it isn't ?
FWIW, as I hinted earlier, this match semantics dates back to the paper introducing case classes in 2007 (http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.547.5623&rep=rep1&type=pdf). Check out Fig. 11 of that paper.
if HashMap was contravariant in K ... but I assume there's a good reason it isn't
HashMap
extends Iterable[(K, V)]
(but even if it didn't, consider such methods as keysIterator
)
What exactly makes the compiler assume that it can compile this to a runtime equality check
That's the semantics of
x: foo.type
type tests according to the spec.
Btw, that would be eq
checking, not even ==
/equals
:
- A singleton type 𝑝.type. This type pattern matches only the value denoted by the path 𝑝 (the eq method is used to compare the matched value to 𝑝).
(https://www.scala-lang.org/files/archive/spec/2.13/08-pattern-matching.html#type-patterns)
opaque types are not injectives like class types, if we have
Foo[X] = Foo[Y]
we cannot conclude anything about the relationship betweenX
andY
, because we could have definedopaque type Foo[T] = Int
for example.
In the example you're not only establishing x: B =:= B
but x: y.type
, i.e. x and y are the same object. Shouldn't this be sufficient to conclude that A = B
even if =:=
is not injective? Is there a fundamental reason why this is not the case or does the opaque type exploit a limitation of the type checker?
Shouldn't this be sufficient to conclude that A = B even if =:= is not injective?
Injectivity is precisely the property you need to conclude A = B
from from Foo[A] = Foo[B]
.
does the opaque type exploit a limitation of the type checker?
No, you can achieve the same thing in other ways:
type Foo[T] = String
("": Foo[Int]) eq ("": Foo[String])
implicitly[Foo[Int] =:= Foo[String]]
Again, you can't conclude anything about the relationship between Int and String from these.
It doesn't really answer my question but after thinking about it some more I agree. Even x = y
(rather than Foo[A] = Foo[B]
) is not sufficient to prove A = B
if Foo
is not injective.
Still, the opaque type looks like a hack to me. It prevents A = B
but not the premise a.type = b.type
which is already wrong if you rely on erasure to cast a shared instance to either type. So I suppose that's +1 from me for @phantom
.
Still, the opaque type looks like a hack to me. It prevents
A = B
but not the premisea.type = b.type
which is already wrong if you rely on erasure to cast a shared instance to either type. So I suppose that's +1 from me for@phantom
.
The @phantom
approach will behave the same with respect to a.type =:= b.type
reasoning. So if opaque types are a hack, then so is @phantom
. Personally I think either is fine, but the @phantom
approach will break much less code so I'm in favor of it.
I think the bigger problem right now is HashMap.empty and similar things in the standard library. If we can't find a satisfying solution for those then we'll have to follow @AleksanderBG original proposal and stop inferring GADT bounds from type tests like those, in which case =:= might as well stay the way it is right now.
Wouldn't a @phantom
annotation work for HashMap in the same way, as soon as forward BC is dropped and adding classes becomes possible again (which IIRC is planned for 3.0 anyway and was publicly announced)?
That seems weird to me since K is definitely not a phantom type parameter in a hashmap.
also not sure how this phantom annotation would work with class hierarchies, especially since a HashMap[K, V] is an Iterable[(K, V)] as mentioned above
That seems weird to me since K is definitely not a phantom type parameter in a hashmap.
I don't think Ord[K]
makes a difference; containing a K
or Array[K]
would, and would make the cast unsound.
And there exists one hashmap where they are phantoms; that's clearly enough. Same for the hypothetical "invariant Nil
" example.
also not sure how this phantom annotation would work with class hierarchies, especially since a HashMap[K, V] is an Iterable[(K, V)] as mentioned above
Argh, great point. Inference is unsound on Iterable
as well (and any superclass), for the same reason.
m1 : HashMap[K1, V1], m2: HashMap[K2, V2]
, what can you safely infer from m1.type = m2.type
? Nothing, because both m1
and m2
could be HashMap.empty
; so both parameters are phantoms.m1 : Iterable[A], m2: Iterable[B]
, what can you safely infer from m1.type = m2.type
? Still nothing, because both m1
and m2
could still be HashMap.empty
.So, for soundness, if K[_]
is invariant and a1 : K[A1], a2: K[A2], a1: a2.type
, inferring A1 = A2
is only sound if K[_]
contains no empty
-like values (values that belong to the intersection of K[B1]
and K[B2]
for some B1 != B2
). This property is stronger than injectivity: not only must K
map different input types to different output types, but the output types must be disjoint.
Iterable
is covariant, so it should not be a problem.
I'm in favor of making K
in HashMap
a @phantom
type (and having it special-cased in the compiler in the meantime, if we can't change it now).
That seems weird to me since K is definitely not a phantom type parameter in a hashmap.
The @phantom
annotation removes reasoning capabilities from user code — so it can be soundly approximated. It should be understood as "potentially phantom" (and as @Blaisorblade says, it indeed is for the empty HashMap
instance).
How about emitting an unchecked
warning for type patterns checking for a singleton (e: p.type
)? In a way, this is similar to a type pattern e: K[String]
: at run-time we cannot reliably check whether the value conforms to the type or not.
scala> def f[T](o: Option[T]): String = o match { case s: Option[String] => s.get }
^
warning: non-variable type argument String in type pattern Option[String] is unchecked since it is eliminated by erasure
f: [T](o: Option[T])String
scala> f(Some(1))
java.lang.ClassCastException: java.lang.Integer cannot be cast to java.lang.String
at .f(<console>:1)
... 28 elided
In a way, this is similar to a type pattern
e: K[String]
: at run-time we cannot reliably check whether the value conforms to the type or not.
I think the correct parallel to what you're proposing would be emitting an unchecked
warning when users match a List[String]
with case (x: String) :: xs
since we cannot reliably check whether the pattern is always valid as x
may not conform to String
at runtime — someone could have used an asInstanceOf
on the scrutinee list at some point earlier.
In both cases what this is really about is unsound uses of asInstanceOf
.
Of course, the "matching scrutinee types" use case is anecdotal in comparison to normal pattern matching, so it may still be okay to have the warning.
The real issue here is not pattern matching, it is not GADT, it is not singleton types. The problem is the cast. As soon as you do a cast, you gave your right to soundness.
We use theoretically invalid casts that are actually valid under erasure for efficiency. But that is at the expense of soundness. No amount of patching it up after the fact will truly fix the issue.
@sjrd to recapture what was said before: the cast mimicks the way languages like Haskell reuse object instances when possible. ~So far, it wasn't possible to cause a type soundness issue with such casts; now, thanks to the World Class GADT Constraint Inference, it is possible.~ EDIT: Apparently this was already a problem in Scala 2, thanks @neko-kai.
@LPTK I'm kind of surprised you'd say that it "should be OK that you can upcast Map
to Iterable
, since Iterable
is covariant" - that's unusually unprincipled for you. I think that's actually a good argument as to why @phantom
is not such a great solution as it may appear at first glance - for instance, it may not be possible to actually attach the annotation to the correct superclass, since one may not be able to modify it. Also, now that we need to take upcasting/downcasting into account, we'd significantly complicate the implementation - I didn't think of those when I said that the implementation is going to be simple.
One final observation: we are 28 comments into this discussion, and I have not observed a single person give an example of inferring GADT constraints from singleton type checks being actually useful. Meanwhile, we've seen multiple potential makeshift solutions, and all of them have some holes. If nobody can find such an example, I'd be in favour of simply not inferring GADT constraints from such type checks.
@AleksanderBG This issue is not new to Dotty, it works the same way in Scala 2
I guess GADT constraints from singletons could be useful whenever there's a GADT with object members:
enum TimeUnit[T] {
def convert(t: Time): Time = this match {
case TMonth => t.truncateToMonth
case TDay => t.truncateToDay
}
case TMonth extends TimeUnit[Month]
case TDay extends TimeUnit[Day]
}
def convert[T](tu: TimeUnit[T], t: Time): T = tu match {
case TMonth => Month(tu.convert(t))
case TDay => Day(tu.convert(t))
)
@neko-kai that's a good point, but in these cases it's definitely fine to infer constraints, since we're comparing with a literal. The example I would be looking for would check if two unknown objects (say, arguments to a function) are actually the same object, and then use the GADT constraints following from that.
@AleksanderBG, there is a paper which makes good use of these constraints, which has already been cited twice by @Blaisorblade. Is there a reason you keep ignoring it? Let me paste the example below for you, since you seem to have trouble opening the PDF 😛
It's about encoding type-safe environments:
abstract class Env {
def apply[a](v : Var[a]): a
def extend[a](v : Var[a], x : a) = new Env {
def apply[b](w : Var[b]): b = w match {
case : v.type => x // v eq w, hence a = b
case => Env.this.apply(w)
}
}
}
object empty extends Env {
def apply[a](x : Var[a]): a = throw new Error(”not found : ”+x.name)
}
// Evaluation :
def eval[a](t : Term[a], env : Env): a = t match {
...
case f : Lam[b, c] => (y : b) => eval(f.e, env.extend(f.x, y))
...
I'm kind of surprised you'd say that it "should be OK that you can upcast
Map
toIterable
, sinceIterable
is covariant"
Do you have a specific example of unsoundness due to Iterable
in mind, assuming K
in Map
does have the @phantom
annotation? If we have x: Iterable[A]
and y: Iterable[B]
and we know x.type =:= y.type
, all we can conclude is that there exists a type C
such that C <: A & B
, no? (And this does not give us anything since it is always true — take C = Nothing
.) It does not mean that A =:= B
. I may be missing something here, though.
the cast mimicks the way languages like Haskell reuse object instances when possible.
There are plenty of things that ML languages do which make no sense in Scala. Like the way OCaml handles GADT reasoning based on type equalities, as you very well know!
@LPTK indeed the paper does show an example! In the future, I shall try to read every paper linked in any thread which discusses what should we do if a user, with determination worthy of a better cause, tries to point a smoking gun at their own foot and blow it off - I think that's going to be a great approach!
But seriously, @Blaisorblade did explicitly mention that the paper provided an example, and I glossed over that - 'scuse me. I guess that this means that "just stop inferring constraints from singleton type checks" solution also has a hole in it.
To be more specific about my problem with @phantom
: there is no issue in the specific example of Map
and Iterable
. There is an issue in the general example of a class C[…, @phantom T, …]
inheriting from D[…, T, …]
. What that means is that we should not infer GADT constraints from the relevant type parameter of D
, but we have no hope of knowing that if all we see is D
. There's a whole bunch of things to consider here - should we simply require that the relevant type parameter of D
is also @phantom
, or should we permit the inheritance if the type parameter is co/contra-variant? What happens if we instead inherit from D[…, F[T], …]
for some F
? Are there F
-s such that no @phantom
is necessary? We might say "just require the user to manually find a good place to stick the @phantom
annotation", but what happens if the user cannot modify the definition of D
?
@AleksanderBG @LPTK While discussions can be frustrating, I recommend avoiding such passive-aggressive sarcasm :-) It’d have been more effective if I pasted the example myself, since my remark needed two clarifications for two different and intelligent readers; thanks @LPTK for doing so.
@AleksanderBG Instead of @phantom, maybe we should use some @cophantom
annotation (maybe to be called @gadt
, but let’s not bikeshed yet), and enable (some) GADT inference features only for classes with such annotations (so, to be used in the cases opposite to @phantom
, whence that name).
That could work because Scala 2 essentially didn’t have working GADTs, so it’s okay to require an annotation to enable them; and because it propagates in a better way — you can’t safely use these casts for any descendent of a @gadt
class.
@neko-kai The problem doesn’t affect Scala 2 the same way, because the compiler does give an unchecked warning. ClassCastException are bugs if they don’t involve code with unchecked warnings — and (usually) if they don’t involve casts (as @sjrd points out). But these casts have in fact been safe for a long time!
And in fact I slightly disagree with @sjrd: under certain conditions you can formally prove that some well-behaved code using casts is in fact safe, as done by the RustBelt research project. We can’t do it for full Scala (and not anytime soon), but I’ve formalized in Coq two examples for a pDOT variant (and submitted a paper, preprint at https://iris-project.org/pdfs/2020-dot-submission.pdf; sorry to toot my own horn, but it’s relevant). Of course, you can’t allow both those examples and singleton inference at the same time; and allowing unsound examples is tricky business (you must use the same semantics). Last, my semantics cannot prove these casts sound yet; I first need support for HK-DOT.
@Blaisorblade
The problem doesn’t affect Scala 2 the same way, because the compiler does give an unchecked warning. ClassCastException are bugs if they don’t involve code with unchecked warnings
Since Scala 2 already produces an unchecked warning for such patterns then Dotty could sidestep the issue by doing that too. This won't be a regression compared to Scala 2, as an ordinary user I'd deem that acceptable.
@Blaisorblade don't worry, @AleksanderBG and I are just joking (we're friends in real life).
Yeah, implementing @phantom
correctly sounds tricky, because we have to make sure it doesn't leak, so that would probably be a lot of efforts. Given that this is a fringe feature anyways, I think just having an unchecked
warning would be good enough for now.
Yeah, I'd be up for unchecked + maybe marking this as stat:revisit. I'm sure more urgent bugs abound :-)
I'm sure more urgent bugs abound
unfortunately those bugs aren't as much fun to discuss :-)
Given that 1) instance reuse tricks seem to be reasonably common (we've found at least two in stdlib) and 2) singleton type patterns that can cause the issue seem to be exceedingly rare, simply emitting a warning on all such patterns sounds good to me.
In the standard library,
=:=
is defined as follow:I always thought this was perfectly fine, but thanks to Dotty's world-class GADT support, it isn't anymore :)
The same code also compiles and crashes in scalac, but produces an unchecked erasure warning which I believe is incorrect: if it weren't for the cast used to get an instance of
=:=
, this code would be perfectly fine.What can we do about this? In the short-term, either do nothing or special-case the GADT handling of
=:=
in the compiler. in the long-term, we should rewrite=:=
and<:<
using an opaque type to some dummy value:There's still a cast(EDIT: actually not needed), but the use of opaque types prevents the bogus inference of GADT bounds: https://github.com/lampepfl/dotty/blob/b448db2454e003f1448a31746daed11001a2ec86/compiler/src/dotty/tools/dotc/core/TypeComparer.scala#L869-L870(another solution would be to somehow enforce that values of type
=:=
must always be erased at runtime, and therefore do not have a notion of reference equality)