voltrondata / spark-substrait-gateway

Implements a gateway that speaks the SparkConnect protocol and drives a backend using Substrait (over ADBC Flight SQL).
Apache License 2.0
15 stars 8 forks source link

Support paths specified as DataFrameReader options #56

Closed pthatte1-bb closed 1 month ago

pthatte1-bb commented 1 month ago
Supported style:
return spark_session.read.parquet(location_customer, mergeSchema=False)
Unsupported style:
spark_options = {"path": location_customer}
return spark_session.read.options(**spark_options).load()
EpsilonPrime commented 1 month ago

This should be straightforward to add. One thing we should make sure to do is to add some sort of enforcement that the directories are limited to a valid set of locations or restrict them to a subdirectory to prevent arbitrary remote access.