The singleton instance trick used by `=:=` is unsound

smarter commented 4 years ago

In the standard library, =:= is defined as follow:

  // the only instance for <:< and =:=, used to avoid overhead
  private val singleton: =:=[Any, Any] = new =:=[Any,Any] {
    // ...
  }
  implicit def refl[A]: A =:= A = singleton.asInstanceOf[A =:= A]
}

I always thought this was perfectly fine, but thanks to Dotty's world-class GADT support, it isn't anymore :)

object Test {
  def cast[A, B](elem: A, default: B)(implicit x: A =:= A, y: B =:= B): B = x match {
    case x: y.type =>
      elem
    case _ =>
      default
  }

  def main(args: Array[String]): Unit = {
    val x: String = cast(1, "") // ClassCastException
  }
}

The same code also compiles and crashes in scalac, but produces an unchecked erasure warning which I believe is incorrect: if it weren't for the cast used to get an instance of =:=, this code would be perfectly fine.

What can we do about this? In the short-term, either do nothing or special-case the GADT handling of =:= in the compiler. in the long-term, we should rewrite =:= and <:< using an opaque type to some dummy value:

opaque type <:<[-From, +To] = Object
opaque type =:=[From, To] <: <:<[From, To] = Object
private val singleton: =:=[Any, Any] = new Object
implicit def refl[A]: A =:= A = singleton
// ... and a bunch of extension methods on <:< and =:=

~~There's still a cast~~ (EDIT: actually not needed), but the use of opaque types prevents the bogus inference of GADT bounds: https://github.com/lampepfl/dotty/blob/b448db2454e003f1448a31746daed11001a2ec86/compiler/src/dotty/tools/dotc/core/TypeComparer.scala#L869-L870

(another solution would be to somehow enforce that values of type =:= must always be erased at runtime, and therefore do not have a notion of reference equality)

abgruszecki commented 4 years ago

Can we find a single case where inferring GADT constraints from singleton type checks is actually useful? Taking object identity into account when inferring GADT constraints sounds deeply bogus to me. As we can see, it also prevents useful optimizations such as reusing a single instance of Nil for an invariant list. The simplest fix would then be to just infer no GADT constraints from singleton type checks.

smarter commented 4 years ago

Can we find a single case where inferring GADT constraints from singleton type checks is actually useful?

I don't have a concrete example, but being able to infer constraints from equality sounds like something that'd be useful when writing theorem-prover-style code, maybe @Blaisorblade has an idea ?

it also prevents useful optimizations such as reusing a single instance of Nil

Not at all. List is covariant so Nil is perfectly sound as is. This is really only a problem when people rely on asInstanceOf working because of type erasure.

Blaisorblade commented 4 years ago

Can we find a single case where inferring GADT constraints from singleton type checks is actually useful?

The typing rule makes some sense for objects with type members, tho I’m not sure I can think of a use case there; maybe when you match hlists against null? Emir et al. (2007) present another use-case (http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.88.5295&rep=rep1&type=pdf), but I’m not sure it’s very convincing...

neko-kai commented 4 years ago

This has been discussed a long time ago in the scalac bug tracker, (and apparently it's the reason why polymorphic, non-specialized zero-arg givens aren't lazy vals by default?..) I can't find the link now but I mentioned it in

EDIT: I don’t think the singleton type cast trick is relevant, unless DOT specifies semantics for reference equality? You MUST opt-in to insanity for it to lead to unsoundness – it can also be detected and require a langauge flag.

LPTK commented 4 years ago

@smarter

There's still a cast

Surely, you could also avoid the cast by putting the appropriate private[scala] methods in the companion objects of the opaque types, no? (Or really just putting refl itself there.)

@AleksanderBG

Can we find a single case where inferring GADT constraints from singleton type checks is actually useful?

I'm sure there are many and we just haven't thought of them yet :^) Please do not make things worse just to avoid making them right!

nicolasstucki commented 4 years ago

Another way this could be fixed if by using erased

  type =:=[-From, +To]
  type <:<[-From, +To] <: =:=[From, To]

  erased implicit def refl[A]: A =:= A = ???

  extension ops on [From, To](erased ev: =:=[From, To]) {
    def apply(x: From): To = x.asInstanceOf[To]
  }

The upside (performance-wise and soundness-wise) and downside (usability-wise) is that all =:= parameters would need to be annotated with erased. We can also get rid of the annoying singleton instance.

Note: the collective extension do not support erased yet but it should support it.

LPTK commented 4 years ago

@nicolasstucki what if people have code like this?

def test[T: <:<[Int, *]](arr: ArrayBuffer[Int]): ArrayBuffer[T] = arr.map(summon)

abgruszecki commented 4 years ago

@LPTK what we could also do is add a @phantom annotation to type parameters of classes which says "never infer GADT constraints from this type parameter". Then we could annotate both type parameters of =:= with this annotation and be on our merry way, same with the Nil of an invariant list. Additional benefit is I can see how to implement this w/o too much effort.

Blaisorblade commented 4 years ago

@AleksanderBG It seems you never want to mark only one parameter as phantom (also, I'm not sure the name is good). It's a bit annoying that this annotation is specific for singleton type tests and singleton values casted to multiple types, but I'm not sure of an easy way to generalize it.

OTOH, it seems none of the proposals can go in for 3.0 (except a compiler-hardcoded blacklist of types where GADT inference must be avoided), and I'm unsure which can go in for 3.1...

abgruszecki commented 4 years ago

@Blaisorblade This @phantom is basically another view at Haskell's type parameter roles. Both for us and for Haskell, "phantom" means "not reflected in the underlying runtime structure". Contrast this with (https://gitlab.haskell.org/ghc/ghc/wikis/roles):

It is often the case that programmers use type variables simply to constrain the type checker, not to make any statement about the runtime representation of a type.

It's just that we're approaching the issue from another angle, since in Haskell people wanted a safe way of coercing between values of two types, and what we want is not inferring GADT constraints from runtime instance checks on values which were "unsafely" coerced.

szeiger commented 4 years ago

What exactly makes the compiler assume that it can compile this to a runtime equality check instead of producing an unchecked erasure warning (and why would an opaque type avoid it)?

Casting a single instance of an immutable and invariant class to whatever instance is required is commonplace in the standard library and in other Scala code.

smarter commented 4 years ago

What exactly makes the compiler assume that it can compile this to a runtime equality check

That's the semantics of x: foo.type type tests according to the spec.

instead of producing an unchecked erasure warning

there's nothing that is left unchecked, so there's no warning

and why would an opaque type avoid it

opaque types are not injectives like class types, if we have Foo[X] = Foo[Y] we cannot conclude anything about the relationship between X and Y, because we could have defined opaque type Foo[T] = Int for example.

Casting a single instance of an immutable and invariant class to whatever instance is required is commonplace in the standard library and in other Scala code.

Yes, and I wonder if we could get rid of this pattern. For =:= it seems that we can easily using an opaque type. But I don't have a good solution for HashMap.empty: https://github.com/scala/scala/blob/8f20a7d84924939698717d26b3e0e5a5c0da829c/src/library/scala/collection/immutable/HashMap.scala#L2140-L2146 If HashMap was contravariant in K then this would be fine (as we couldn't infer any kind of equality between types between two instances) but I assume there's a good reason it isn't ?

Blaisorblade commented 4 years ago

FWIW, as I hinted earlier, this match semantics dates back to the paper introducing case classes in 2007 (http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.547.5623&rep=rep1&type=pdf). Check out Fig. 11 of that paper.

SethTisue commented 4 years ago

if HashMap was contravariant in K ... but I assume there's a good reason it isn't

HashMap extends Iterable[(K, V)] (but even if it didn't, consider such methods as keysIterator)

dwijnand commented 4 years ago

What exactly makes the compiler assume that it can compile this to a runtime equality check

That's the semantics of x: foo.type type tests according to the spec.

Btw, that would be eq checking, not even ==/equals:

A singleton type 𝑝.type. This type pattern matches only the value denoted by the path 𝑝 (the eq method is used to compare the matched value to 𝑝).

(https://www.scala-lang.org/files/archive/spec/2.13/08-pattern-matching.html#type-patterns)

szeiger commented 4 years ago

opaque types are not injectives like class types, if we have Foo[X] = Foo[Y] we cannot conclude anything about the relationship between X and Y, because we could have defined opaque type Foo[T] = Int for example.

In the example you're not only establishing x: B =:= B but x: y.type, i.e. x and y are the same object. Shouldn't this be sufficient to conclude that A = B even if =:= is not injective? Is there a fundamental reason why this is not the case or does the opaque type exploit a limitation of the type checker?

smarter commented 4 years ago

Shouldn't this be sufficient to conclude that A = B even if =:= is not injective?

Injectivity is precisely the property you need to conclude A = B from from Foo[A] = Foo[B].

does the opaque type exploit a limitation of the type checker?

No, you can achieve the same thing in other ways:

type Foo[T] = String
("": Foo[Int]) eq ("": Foo[String])
implicitly[Foo[Int] =:= Foo[String]]

Again, you can't conclude anything about the relationship between Int and String from these.

szeiger commented 4 years ago

It doesn't really answer my question but after thinking about it some more I agree. Even x = y (rather than Foo[A] = Foo[B]) is not sufficient to prove A = B if Foo is not injective.

Still, the opaque type looks like a hack to me. It prevents A = B but not the premise a.type = b.type which is already wrong if you rely on erasure to cast a shared instance to either type. So I suppose that's +1 from me for @phantom.

LPTK commented 4 years ago

Still, the opaque type looks like a hack to me. It prevents A = B but not the premise a.type = b.type which is already wrong if you rely on erasure to cast a shared instance to either type. So I suppose that's +1 from me for @phantom.

The @phantom approach will behave the same with respect to a.type =:= b.type reasoning. So if opaque types are a hack, then so is @phantom. Personally I think either is fine, but the @phantom approach will break much less code so I'm in favor of it.

smarter commented 4 years ago

I think the bigger problem right now is HashMap.empty and similar things in the standard library. If we can't find a satisfying solution for those then we'll have to follow @AleksanderBG original proposal and stop inferring GADT bounds from type tests like those, in which case =:= might as well stay the way it is right now.

Blaisorblade commented 4 years ago

Wouldn't a @phantom annotation work for HashMap in the same way, as soon as forward BC is dropped and adding classes becomes possible again (which IIRC is planned for 3.0 anyway and was publicly announced)?

smarter commented 4 years ago

That seems weird to me since K is definitely not a phantom type parameter in a hashmap.

smarter commented 4 years ago

also not sure how this phantom annotation would work with class hierarchies, especially since a HashMap[K, V] is an Iterable[(K, V)] as mentioned above

Blaisorblade commented 4 years ago

That seems weird to me since K is definitely not a phantom type parameter in a hashmap.

I don't think Ord[K] makes a difference; containing a K or Array[K] would, and would make the cast unsound.

And there exists one hashmap where they are phantoms; that's clearly enough. Same for the hypothetical "invariant Nil" example.

also not sure how this phantom annotation would work with class hierarchies, especially since a HashMap[K, V] is an Iterable[(K, V)] as mentioned above

Argh, great point. Inference is unsound on Iterable as well (and any superclass), for the same reason.

Given m1 : HashMap[K1, V1], m2: HashMap[K2, V2], what can you safely infer from m1.type = m2.type? Nothing, because both m1 and m2 could be HashMap.empty; so both parameters are phantoms.
Given m1 : Iterable[A], m2: Iterable[B], what can you safely infer from m1.type = m2.type? Still nothing, because both m1 and m2 could still be HashMap.empty.

So, for soundness, if K[_] is invariant and a1 : K[A1], a2: K[A2], a1: a2.type, inferring A1 = A2 is only sound if K[_] contains no empty-like values (values that belong to the intersection of K[B1] and K[B2] for some B1 != B2). This property is stronger than injectivity: not only must K map different input types to different output types, but the output types must be disjoint.

LPTK commented 4 years ago

Iterable is covariant, so it should not be a problem.

I'm in favor of making K in HashMap a @phantom type (and having it special-cased in the compiler in the meantime, if we can't change it now).

That seems weird to me since K is definitely not a phantom type parameter in a hashmap.

The @phantom annotation removes reasoning capabilities from user code — so it can be soundly approximated. It should be understood as "potentially phantom" (and as @Blaisorblade says, it indeed is for the empty HashMap instance).

lrytz commented 4 years ago

How about emitting an unchecked warning for type patterns checking for a singleton (e: p.type)? In a way, this is similar to a type pattern e: K[String]: at run-time we cannot reliably check whether the value conforms to the type or not.

scala> def f[T](o: Option[T]): String = o match { case s: Option[String] => s.get }
                                                          ^
       warning: non-variable type argument String in type pattern Option[String] is unchecked since it is eliminated by erasure
f: [T](o: Option[T])String

scala> f(Some(1))
java.lang.ClassCastException: java.lang.Integer cannot be cast to java.lang.String
  at .f(<console>:1)
  ... 28 elided

LPTK commented 4 years ago

In a way, this is similar to a type pattern e: K[String]: at run-time we cannot reliably check whether the value conforms to the type or not.

I think the correct parallel to what you're proposing would be emitting an unchecked warning when users match a List[String] with case (x: String) :: xs since we cannot reliably check whether the pattern is always valid as x may not conform to String at runtime — someone could have used an asInstanceOf on the scrutinee list at some point earlier.

In both cases what this is really about is unsound uses of asInstanceOf.

Of course, the "matching scrutinee types" use case is anecdotal in comparison to normal pattern matching, so it may still be okay to have the warning.

sjrd commented 4 years ago

The real issue here is not pattern matching, it is not GADT, it is not singleton types. The problem is the cast. As soon as you do a cast, you gave your right to soundness.

We use theoretically invalid casts that are actually valid under erasure for efficiency. But that is at the expense of soundness. No amount of patching it up after the fact will truly fix the issue.

abgruszecki commented 4 years ago

@sjrd to recapture what was said before: the cast mimicks the way languages like Haskell reuse object instances when possible. ~So far, it wasn't possible to cause a type soundness issue with such casts; now, thanks to the World Class GADT Constraint Inference, it is possible.~ EDIT: Apparently this was already a problem in Scala 2, thanks @neko-kai.

@LPTK I'm kind of surprised you'd say that it "should be OK that you can upcast Map to Iterable, since Iterable is covariant" - that's unusually unprincipled for you. I think that's actually a good argument as to why @phantom is not such a great solution as it may appear at first glance - for instance, it may not be possible to actually attach the annotation to the correct superclass, since one may not be able to modify it. Also, now that we need to take upcasting/downcasting into account, we'd significantly complicate the implementation - I didn't think of those when I said that the implementation is going to be simple.

One final observation: we are 28 comments into this discussion, and I have not observed a single person give an example of inferring GADT constraints from singleton type checks being actually useful. Meanwhile, we've seen multiple potential makeshift solutions, and all of them have some holes. If nobody can find such an example, I'd be in favour of simply not inferring GADT constraints from such type checks.

neko-kai commented 4 years ago

@AleksanderBG This issue is not new to Dotty, it works the same way in Scala 2

neko-kai commented 4 years ago

I guess GADT constraints from singletons could be useful whenever there's a GADT with object members:

enum TimeUnit[T] {
  def convert(t: Time): Time = this match {
    case TMonth => t.truncateToMonth
    case TDay => t.truncateToDay
  }

  case TMonth extends TimeUnit[Month]
  case TDay extends TimeUnit[Day]
}

def convert[T](tu: TimeUnit[T], t: Time): T = tu match {
  case TMonth => Month(tu.convert(t))
  case TDay => Day(tu.convert(t))
)

abgruszecki commented 4 years ago

@neko-kai that's a good point, but in these cases it's definitely fine to infer constraints, since we're comparing with a literal. The example I would be looking for would check if two unknown objects (say, arguments to a function) are actually the same object, and then use the GADT constraints following from that.

LPTK commented 4 years ago

@AleksanderBG, there is a paper which makes good use of these constraints, which has already been cited twice by @Blaisorblade. Is there a reason you keep ignoring it? Let me paste the example below for you, since you seem to have trouble opening the PDF 😛

It's about encoding type-safe environments:

abstract class Env {
  def apply[a](v : Var[a]): a
  def extend[a](v : Var[a], x : a) = new Env {
    def apply[b](w : Var[b]): b = w match {
      case : v.type => x // v eq w, hence a = b
      case => Env.this.apply(w)
    }
  }
}
object empty extends Env {
  def apply[a](x : Var[a]): a = throw new Error(”not found : ”+x.name)
}
// Evaluation :
def eval[a](t : Term[a], env : Env): a = t match {
  ...
    case f : Lam[b, c] => (y : b) => eval(f.e, env.extend(f.x, y))
  ...

I'm kind of surprised you'd say that it "should be OK that you can upcast Map to Iterable, since Iterable is covariant"

Do you have a specific example of unsoundness due to Iterable in mind, assuming K in Map does have the @phantom annotation? If we have x: Iterable[A] and y: Iterable[B] and we know x.type =:= y.type, all we can conclude is that there exists a type C such that C <: A & B, no? (And this does not give us anything since it is always true — take C = Nothing.) It does not mean that A =:= B. I may be missing something here, though.

the cast mimicks the way languages like Haskell reuse object instances when possible.

There are plenty of things that ML languages do which make no sense in Scala. Like the way OCaml handles GADT reasoning based on type equalities, as you very well know!

abgruszecki commented 4 years ago

@LPTK indeed the paper does show an example! In the future, I shall try to read every paper linked in any thread which discusses what should we do if a user, with determination worthy of a better cause, tries to point a smoking gun at their own foot and blow it off - I think that's going to be a great approach!

But seriously, @Blaisorblade did explicitly mention that the paper provided an example, and I glossed over that - 'scuse me. I guess that this means that "just stop inferring constraints from singleton type checks" solution also has a hole in it.

To be more specific about my problem with @phantom: there is no issue in the specific example of Map and Iterable. There is an issue in the general example of a class C[…, @phantom T, …] inheriting from D[…, T, …]. What that means is that we should not infer GADT constraints from the relevant type parameter of D, but we have no hope of knowing that if all we see is D. There's a whole bunch of things to consider here - should we simply require that the relevant type parameter of D is also @phantom, or should we permit the inheritance if the type parameter is co/contra-variant? What happens if we instead inherit from D[…, F[T], …] for some F? Are there F-s such that no @phantom is necessary? We might say "just require the user to manually find a good place to stick the @phantom annotation", but what happens if the user cannot modify the definition of D?

Blaisorblade commented 4 years ago

@AleksanderBG @LPTK While discussions can be frustrating, I recommend avoiding such passive-aggressive sarcasm :-) It’d have been more effective if I pasted the example myself, since my remark needed two clarifications for two different and intelligent readers; thanks @LPTK for doing so.

@AleksanderBG Instead of @phantom, maybe we should use some @cophantom annotation (maybe to be called @gadt, but let’s not bikeshed yet), and enable (some) GADT inference features only for classes with such annotations (so, to be used in the cases opposite to @phantom, whence that name).

That could work because Scala 2 essentially didn’t have working GADTs, so it’s okay to require an annotation to enable them; and because it propagates in a better way — you can’t safely use these casts for any descendent of a @gadt class.

@neko-kai The problem doesn’t affect Scala 2 the same way, because the compiler does give an unchecked warning. ClassCastException are bugs if they don’t involve code with unchecked warnings — and (usually) if they don’t involve casts (as @sjrd points out). But these casts have in fact been safe for a long time!

And in fact I slightly disagree with @sjrd: under certain conditions you can formally prove that some well-behaved code using casts is in fact safe, as done by the RustBelt research project. We can’t do it for full Scala (and not anytime soon), but I’ve formalized in Coq two examples for a pDOT variant (and submitted a paper, preprint at https://iris-project.org/pdfs/2020-dot-submission.pdf; sorry to toot my own horn, but it’s relevant). Of course, you can’t allow both those examples and singleton inference at the same time; and allowing unsound examples is tricky business (you must use the same semantics). Last, my semantics cannot prove these casts sound yet; I first need support for HK-DOT.

neko-kai commented 4 years ago

@Blaisorblade

The problem doesn’t affect Scala 2 the same way, because the compiler does give an unchecked warning. ClassCastException are bugs if they don’t involve code with unchecked warnings

Since Scala 2 already produces an unchecked warning for such patterns then Dotty could sidestep the issue by doing that too. This won't be a regression compared to Scala 2, as an ordinary user I'd deem that acceptable.

LPTK commented 4 years ago

@Blaisorblade don't worry, @AleksanderBG and I are just joking (we're friends in real life).

Yeah, implementing @phantom correctly sounds tricky, because we have to make sure it doesn't leak, so that would probably be a lot of efforts. Given that this is a fringe feature anyways, I think just having an unchecked warning would be good enough for now.

Blaisorblade commented 4 years ago

Yeah, I'd be up for unchecked + maybe marking this as stat:revisit. I'm sure more urgent bugs abound :-)

SethTisue commented 4 years ago

I'm sure more urgent bugs abound

unfortunately those bugs aren't as much fun to discuss :-)

abgruszecki commented 4 years ago

Given that 1) instance reuse tricks seem to be reasonably common (we've found at least two in stdlib) and 2) singleton type patterns that can cause the issue seem to be exceedingly rare, simply emitting a warning on all such patterns sounds good to me.

scala / scala3

The singleton instance trick used by `=:=` is unsound #8430