Closed sjrusso8 closed 1 month ago
@hntd187 and @abrassel looking for some input on this :) Does this implementation make sense?
PS @MrPowers I know you are a big advocate of using the transform
method. How does this look?
I wouldn't try and emulate the python call signature, that is likely something that will drive you insane. Your definition is very similar to the scala one https://spark.apache.org/docs/latest/api/scala/org/apache/spark/sql/Dataset.html#transform[U](t:org.apache.spark.sql.Dataset[T]=%3Eorg.apache.spark.sql.Dataset[U]):org.apache.spark.sql.Dataset[U]
So I think 1 function call here Df in, Df out is fine, and people can just chain them along as they go. Yea, and the callsite is self
too, so I think this is good.
I also like this implementation :)
Description
feat(dataframe): implement transform
transform
that accepts a closure where the first param is a dataframeExample Usage
This uses closures and differs slightly from the pyspark implementation. Pyspark allows for positional or kwargs args. It's a little tricky to implement that specific option in rust. So I opted for just a closure that accepts and returns a
DataFrame
and it's on the user to provide any additional args as part of the captured scope of the closure.