Closed jberkel closed 10 months ago
Hey @jberkel , the current behavior with _n
mirrors the behavior of the encoder for tuples provided by Spark.
You can build a different encoder buy declaring defining your own Serializer
and Deserializer
for a Tuple (e.g. given [T] Serializer[T <: Tuple] = ???
)
It is a bit involved but should be doable.
@vincenzobaz Thanks for the pointer, I'll give it a go and add the implementation to this issue.
@jberkel no problem! We could add your implementation to the readme as an example of encoder customization :)
Right now tuples are encoded as structs with keys
_1
,_2
, etc. for each elementThis is understandable, as tuples are just a type of
Product
. However, when compatibility with other libraries is required, this is a problem. Upickle for instance serializes tuples as lists:https://com-lihaoyi.github.io/upickle/#Collections
Does it make sense to support this form of tuple encoding? It'd also help to save space, as no keys are needed. However, I imagine it wouldn't play well with the schema, for mixed-type tuples?