vincenzobaz / spark-scala3

Apache License 2.0
89 stars 15 forks source link

Make encoder for product match the one in Spark #47

Closed joan38 closed 1 year ago

joan38 commented 1 year ago

Copied from https://github.com/apache/spark/blob/v3.3.1/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/ScalaReflection.scala#L389-L395

Fixes #46

michael72 commented 1 year ago

Hey @joan38 ‐ the changes look good. It is around the same code lines where I did some changes in my PR. But I think those changes should be independent. Could you also check against spark v3.5.0 if that code is still the same? Also a small unit test that only works together with your changes would be great I think. Cheers!

joan38 commented 1 year ago

@michael72 Spark 3.5.0 completely changed how encoders work and introduced new Expressions. If we do what they do, this lib will not work with older Spark 3 versions.

I added more stuff that was breaking joinWith and a test.

michael72 commented 1 year ago

@joan38 I currently don't have a laptop with me so it's difficult to check, but I thought between 3 4.1 and 3.5.0 they moved some code around, but in essence it was the same, but I could be completely wrong. However taking an older version like 3.3.x as a reference is fine, but completely ignoring the newest feels wrong to me also because this library tries to remedy the very fact that evolvement of Scala was completely ignored in spark! From my POV the PR is fine though

vincenzobaz commented 1 year ago

I checked out this PR and rebased on my spark version matrix PR. compile and test are successful but runMain on StarWars fails.

michael72 commented 1 year ago

@vincenzobaz the Starwars example wouldn't work with my PR either because of the type checks applied. That example would have to be reworked. Did you copy that from somewhere or was it "handcrafted"? 😉

vincenzobaz commented 1 year ago

halfway copied halfway hand crafted :laughing: We can change it as much as we want, I do not have strong feelings about it.

In fact it is quite ugly

joan38 commented 1 year ago

The newly added thing is this whole AgnosticEncoder: https://github.com/apache/spark/blob/master/sql/api/src/main/scala/org/apache/spark/sql/catalyst/encoders/AgnosticEncoder.scala#L114

If I'm not mistaken, I don't see how we can adopt it without requiring Spark 3.5.

michael72 commented 1 year ago

Well if necessary we can copy it - or just everything that we need - same as I did (more or less) with the Helper file. Lot of this project copies parts of the spark project that fit with the encoder/decoder logic.

joan38 commented 1 year ago

Ok, feel free to copy this PR and modify as need to move forward. On our side we copied the library in our codebase since joinWiths are broken in the current state of this library. But we'd be happy to depend on it again once this is fixed.

joan38 commented 1 year ago

Hi, did we come to a decision on this? Should we move forward with this version in order to unlock joins?

joan38 commented 1 year ago

Superseded by #45