voltrondata / spark-substrait-gateway

Implements a gateway that speaks the SparkConnect protocol and drives a backend using Substrait (over ADBC Flight SQL).
Apache License 2.0
16 stars 9 forks source link

feat: implement show_string using relations #1

Closed EpsilonPrime closed 8 months ago

EpsilonPrime commented 8 months ago

While writing this is was evident that writing generated protobufs even with list comprehensions were going to be nigh near impossible to read. I introduced the start of a substrait_builder class which allows for an easier way of constructing relations and generally allows for composability of sections using functions. It doesn't implement all functions, has some repetition that could be eliminated (for instance how many methods do we need that take two expressions as arguments?), and could be even cleaner if every function usage didn't need to pass in the function information. However addressing those issues are beyond the scope of what we were trying to accomplish here. The builder really belongs in substrait-python and doing so will require refactoring how functions are converted from spark names into Substrait. After that the other code already in spark_to_substrait.py can be updated to use the builder which will also make it more readable.