Closed abrassel closed 1 month ago
Thanks for doing this! Just a few notes for some of the sections.
sparkContext
and the row from the SparkSession
table, it's a JVM attribute and isn't support with spark connectremote
to be refer to Spark Connection connect string
and have it linked this page https://github.com/apache/spark/blob/master/connector/connect/docs/client-connection-string.mdenableHiveSupport
is not supported with spark connectStreamingQuery
are implemented.
id
run_id
(should be changed to runId
)name
awaitTermination
lastProgress
recentProgress
isActive
status
DataFrameReader
are implemented.
format
load
option
options
table
I'm not sure if UdfRegistration
, and UdtfRegistration
would be possible in rust. I think each of those depends on the JVM or a specific python function to be serialized and then evaluated on the workers.
I think that we can probably do UDFs if we use pyo3 or equivalent to generate python lambdas
thanks for the feedback @sjrusso8 ! I think I implemented all of the changes.
Description
I went through the pyspark documentation and attempted to
Related Issue(s)
Documentation
https://spark.apache.org/docs/latest/api/python/index.html