sjrusso8 / spark-connect-rs

Apache Spark Connect Client for Rust
https://docs.rs/spark-connect-rs
Apache License 2.0
76 stars 13 forks source link

feat: initial implementation of DataframeWriterV2 #30

Closed hntd187 closed 4 months ago

hntd187 commented 4 months ago

Description

Initial impl of DataframeWriterV2

Related Issue(s)

Documentation

Based off https://github.com/apache/spark/blob/master/connector/connect/client/jvm/src/main/scala/org/apache/spark/sql/DataFrameWriterV2.scala and https://github.com/apache/spark/blob/57b207774382e3a35345518ede5cfc028885f90b/sql/core/src/test/scala/org/apache/spark/sql/DataFrameWriterV2Suite.scala

sjrusso8 commented 4 months ago

Thanks for submitting this! We might want to stick with using &str as the argument type over the trait ToString. I've been loosely basing some of the interface design off of the datafusion arguments, and they leverage &str. The rust docs also mention that ToString is related to Display.

Please look over the clippy messages on build step, but overall design this looks good! Thanks for doing this!

hntd187 commented 4 months ago

Well the relation is telling you not to implement ToString yourself, implement Display instead, but in this case I'm not implementing it, just using it for bounds. The only reason I added it was because I noticed you just do .to_string() inside the methods anyways, so I figured it gave some additional flexibility to what you could pass in as long it was a ToString object. I can remove it with &str if you'd prefer.

sjrusso8 commented 4 months ago

Lets stick with &str. The protobuf requires String so that's mostly why there is to_string() everywhere.

Since someone doesn't implement ToString directly, but actually implement the Display trait, we'd get some weird allowable input args. So any custom struct that implements Display would be an okay input from the compiler perspective, but would likely result in an error when submitted to the cluster.