scala / scala3

The Scala 3 compiler, also known as Dotty.
https://dotty.epfl.ch
Apache License 2.0
5.83k stars 1.05k forks source link

Misbehavior of macro-generated match when matching case object with lowercase name #20350

Open MateuszKubuszok opened 5 months ago

MateuszKubuszok commented 5 months ago

Compiler version

3.3.3

Minimized code

repro.scala

//> using scala 3.3.3

import scala.quoted.*

object Macros {

  def matchOnImpl[A: Type](a: Expr[A])(using quotes: Quotes): Expr[Unit] = {
    import quotes.*, quotes.reflect.*

    // workaround to contain @experimental from polluting the whole codebase
    object FreshTerm {
      private val impl = quotes.reflect.Symbol.getClass.getMethod("freshName", classOf[String])

      def generate(prefix: String): String = impl.invoke(quotes.reflect.Symbol, prefix).asInstanceOf[String]
    }

    extension (sym: Symbol)
     def isPublic: Boolean = !sym.isNoSymbol &&
          !(sym.flags.is(Flags.Private) || sym.flags.is(Flags.PrivateLocal) || sym.flags.is(Flags.Protected) ||
            sym.privateWithin.isDefined || sym.protectedWithin.isDefined)

    def isSealed[A: Type]: Boolean =
      TypeRepr.of[A].typeSymbol.flags.is(Flags.Sealed)

    def extractSealedSubtypes[A: Type]: List[Type[?]] = {
      def extractRecursively(sym: Symbol): List[Symbol] =
        if sym.flags.is(Flags.Sealed) then sym.children.flatMap(extractRecursively)
        else if sym.flags.is(Flags.Enum) then List(sym.typeRef.typeSymbol)
        else if sym.flags.is(Flags.Module) then List(sym.typeRef.typeSymbol.moduleClass)
        else List(sym)

      extractRecursively(TypeRepr.of[A].typeSymbol).distinct.map(typeSymbol =>
        typeSymbol.typeRef.asType
      )
    }

    if isSealed[A] then {
      val cases = extractSealedSubtypes[A].map { tpe =>
        val sym = TypeRepr.of(using tpe).typeSymbol
        val bindName = Symbol.newVal(Symbol.spliceOwner, FreshTerm.generate(sym.name.toLowerCase), TypeRepr.of[A],Flags.EmptyFlags, Symbol.noSymbol)
        val body = '{ println(${ Expr(sym.name) }) }.asTerm

        if sym.flags.is(Flags.Enum | Flags.JavaStatic) then
          CaseDef(Bind(bindName, Ident(sym.termRef)), None, body)
        else if sym.flags.is(Flags.Module) then
          CaseDef(
            Bind(bindName, Ident(sym.companionModule.termRef)),
            None,
            body
          )
        else
          CaseDef(Bind(bindName, Typed(Wildcard(), TypeTree.of(using tpe))), None, body)
      }
      Match(a.asTerm, cases).asExprOf[Unit]
    } else '{ () }
  }

  inline def matchOn[A](a: A): Unit = ${ matchOnImpl[A]('{ a }) }
}

repro.test.scala

//> using dep org.scalameta::munit:1.0.0-RC1

package test

sealed trait Upper
object Upper {
  case object A extends Upper
  case object B extends Upper
  case object C extends Upper
}

sealed trait lower
object lower {
  case object a extends lower
  case object b extends lower
  case object c extends lower
}

class Test extends munit.FunSuite {

  test("should print its own name") {
    Macros.matchOn[Upper](Upper.A)
    Macros.matchOn[Upper](Upper.B)
    Macros.matchOn[Upper](Upper.C)

    Macros.matchOn[lower](lower.a)
    Macros.matchOn[lower](lower.b)
    Macros.matchOn[lower](lower.c)
  }
}
scala-cli test .

Output

A$
B$
C$
a$
a$
a$

Expectation

A$
B$
C$
a$
b$
c$

When case object with lowercased names is used match seem to fall through on the first case. For the same macro code the behavior is correct if the name of case object starts with an upper case.

I haven't observed such issue with enums.

Gedochao commented 5 months ago

behaviour still present on 3.5.0-RC1-bin-20240508-b10d64e-NIGHTLY

mbovel commented 3 months ago

@hamzaremmal do you think that would be appropriate for a spree?

mbovel commented 1 week ago

Still not sure if this is appropriate for a spree and how hard that would be to fix. cc @hamzaremmal

hamzaremmal commented 1 week ago

Sorry @mbovel, I haven't seen the first ping. I don't think it's suitable for a spree.

dos65 commented 1 week ago

It's not a bug in compiler. The actual issue is in macros. This version ( without a branch for module) works fine

Fixed version ```scala //> using scala 3.3.3 import scala.quoted.* object Macros { def matchOnImpl[A: Type](a: Expr[A])(using quotes: Quotes): Expr[Unit] = { import quotes.*, quotes.reflect.* // workaround to contain @experimental from polluting the whole codebase object FreshTerm { private val impl = quotes.reflect.Symbol.getClass.getMethod("freshName", classOf[String]) def generate(prefix: String): String = impl.invoke(quotes.reflect.Symbol, prefix).asInstanceOf[String] } extension (sym: Symbol) def isPublic: Boolean = !sym.isNoSymbol && !(sym.flags.is(Flags.Private) || sym.flags.is(Flags.PrivateLocal) || sym.flags.is(Flags.Protected) || sym.privateWithin.isDefined || sym.protectedWithin.isDefined) def isSealed[A: Type]: Boolean = TypeRepr.of[A].typeSymbol.flags.is(Flags.Sealed) def extractSealedSubtypes[A: Type]: List[Type[?]] = { def extractRecursively(sym: Symbol): List[Symbol] = if sym.flags.is(Flags.Sealed) then sym.children.flatMap(extractRecursively) else if sym.flags.is(Flags.Enum) then List(sym.typeRef.typeSymbol) else if sym.flags.is(Flags.Module) then List(sym.typeRef.typeSymbol.moduleClass) else List(sym) extractRecursively(TypeRepr.of[A].typeSymbol).distinct.map(typeSymbol => typeSymbol.typeRef.asType ) } if isSealed[A] then { val cases = extractSealedSubtypes[A].map { tpe => val sym = TypeRepr.of(using tpe).typeSymbol val bindName = Symbol.newVal(Symbol.spliceOwner, FreshTerm.generate(sym.name.toLowerCase), TypeRepr.of[A],Flags.EmptyFlags, Symbol.noSymbol) val body = '{ println(${ Expr(sym.name) }) }.asTerm if sym.flags.is(Flags.Enum | Flags.JavaStatic) then CaseDef(Bind(bindName, Ident(sym.termRef)), None, body) else CaseDef(Bind(bindName, Typed(Wildcard(), TypeTree.of(using tpe))), None, body) } Match(a.asTerm, cases).asExprOf[Unit] } else '{ () } } ```

Explanation:

The removed from above code:

CaseDef(
    Bind(bindName, Ident(sym.companionModule.termRef)),
    None,
    body
)

generates the following code:

b match
  case a$$macro1 @ a => println("a")
  case b$$macro1 @ b => println("b")

which is equivavlent to case _ => println("a"). See this section about varid at https://www.scala-lang.org/files/archive/spec/3.4/08-pattern-matching.html#variable-patterns

Example in scastie - https://scastie.scala-lang.org/s0mcdSvpTqOJ3fPXDpnEtQ

jchyb commented 1 week ago

So in summary, we might have thought that this macro generates this (which would work correctly):

(lower.b: lower) match
    case amacro @ lower.a => (...)
    case bmacro @ lower.b => (...)
    case bmacro @ lower.c => (...)

but it actually interprets it as this:

(lower.b: lower) match
    case amacro @ a => (...)
    case bmacro @ b => (...)
    case bmacro @ c => (...)

This would make a lot of sense, as that explains the different behavior of Upper and lower, however the worrying part are the compiler printouts showing what gets generated:

lower.b match
        {
          case a$$macro$5 @ lower.a => println("a$")
          case b$$macro$5 @ lower.b => println("b$")
          case c$$macro$5 @ lower.c => println("c$")
        }

which looks correct. I guess there may be a node missing needed for PatternMatcher that does not usually get printed out in the compiler, but gets added in Typer or other phases (like the Typed node) - which is why the macro example doesn't work, but the regular example does.

dos65 commented 1 week ago

@jchyb

however the worrying part are the compiler printouts showing what gets generated:

Created Ident in macros goes with type lower.a - that's why it's printed like a Select.

However, it's an Ident and at PatternMatcher stage it sees Bind(" a$$macro$5", Ident("a")) and then it falls into WildcardPattern - https://github.com/scala/scala3/blob/e5f7272e55ba71f80c553bd66e54df480355420a/compiler/src/dotty/tools/dotc/transform/PatternMatcher.scala#L210

dwijnand commented 1 week ago

Yeah, we did tackle this at some point - I think it was a spree. And basically there's some code branch (I think it was within Quotes) that considers that lower is stable, so the tree representation can be folded down into just an Ident with the correct type. So on the one hand we can't write that correctly in source, on the other hand the prefix is indeed pure, so folding it isn't wrong per se... So I wasn't sure where the correct fix is; that is from the assumption that the macro user's code is reasonable - the workaround is changing the macro code.

jchyb commented 1 week ago

@dos65 @dwijnand Thank you for the explanations!

I imagine the best course of action would be to add a check for incorrect/unexpected Idents in -Xcheck-macros (the main fear here is there could be a lot of those in the already published libraries), alternatively we could sanitize Idents into correct forms in Quotes IdentModule.apply() depending on the termRef (this might be unpopular/cause other unexpected issues)