vincenzobaz / spark-scala3

Apache License 2.0
89 stars 15 forks source link

Problems using nested case classes #44

Closed Knorreman closed 1 year ago

Knorreman commented 1 year ago

Hello

We need to use nested case classes to model our data.

import org.apache.spark.sql.catalyst.encoders.encoderFor
import scala3encoders.given

case class Origin(countries: Seq[String])
case class Vegetables(name: String, origin: Origin)
val encoder = encoderFor[Vegetables]

This causes a compile error

No given instance of type scala3encoders.derivation.Deserializer[Origin] was found.
I found:

scala3encoders.derivation.Deserializer.derivedProduct[Origin](
  Origin.$asInstanceOf[

      scala.deriving.Mirror.Product{
        type MirroredMonoType = Origin; type MirroredType = Origin;
          type MirroredLabel = ("Origin" : String);
          type MirroredElemTypes = Seq[String] *: EmptyTuple.type;
          type MirroredElemLabels = ("countries" : String) *: EmptyTuple.type
      }

  ],
scala.reflect.ClassTag.apply[Origin](classOf[Origin]))

But given instance derivedProduct in object Deserializer does not match type scala3encoders.derivation.Deserializer[Origin].
    val encoder = encoderFor[Vegetables]

Am I using the encoder wrong or is it something else I should do?

vincenzobaz commented 1 year ago

Hello @Knorreman I suspect the issue might be with encoderFor. If you try with

summon[Encoder[Vegetables]]

it should work

Knorreman commented 1 year ago

Hi @vincenzobaz Unfortunately I get exactly the same error with summon[Encoder[Vegetables]]

vincenzobaz commented 1 year ago

Are you using version 0.2.3?

This program

import org.apache.spark.sql.Encoder

object Vegetables extends App:
  import scala3encoders.given

  case class Origin(countries: Seq[String])
  case class Vegetables(name: String, origin: Origin)
  val encoder = summon[Encoder[Vegetables]]
  println(encoder.schema)

prints

[info] StructType(StructField(name,StringType,true),StructField(origin,StructType(StructField(countries,ArrayType(StringType,true),true)),true))

for me

Knorreman commented 1 year ago

Yes I am using 0.2.3 But I was using spark 3.4.1 and when I tried 3.3.2 it worked!

vincenzobaz commented 1 year ago

Cool! It is the version that the library is based. We have just merged compatibility for 3.5.0 but it is not released yet

michael72 commented 1 year ago

so - the main reason for this error in 0.2.3 seems to happen once you summon[Encoder[Seq[String]]] then we get

[error]  29 |    val bla1 = summon[Encoder[Seq[String]]]
[error]     |                                           ^
[error]     |     Found:    (org.apache.spark.sql.catalyst.expressions.Expression,
[error]     |       org.apache.spark.sql.catalyst.WalkedTypePath) =>
[error]     |       org.apache.spark.sql.catalyst.expressions.Expression
[error]     |     Required: org.apache.spark.sql.catalyst.expressions.Expression =>
[error]     |       org.apache.spark.sql.catalyst.expressions.Expression
[error]     |---------------------------------------------------------------------------
[error]     |Inline stack trace
[error]     |- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -

But this was also the part that was reworked in spark 3.5.0 and I seemed to have taken that over with the new Helper class... With a publishLocal of a 0.2.4 version it worked for me.

Knorreman commented 1 year ago

Nice! Any ideas when 0.2.4 will be published? We can downgrade to 3.3.2 in the meantime! Thanks guys!

vincenzobaz commented 1 year ago

I am running the release now.