snowflakedb / spark-snowflake

Snowflake Data Source for Apache Spark.
http://www.snowflake.net
Apache License 2.0
211 stars 98 forks source link

[Performance Improvement] Support for AQE mode for delayed query pushdown for optimum runtime & improved debugging #535

Open jalpan-randeri opened 9 months ago

jalpan-randeri commented 9 months ago

Under Adaptive query execution mode, Spark overlaps planning and execution phase, This results in spark running planning multiple time. The current implementation eagerly pushdown query in planning stage, this result into redundant query pushdown to snowflake and it ignores the runtime discovered filters.

This commit handles this scenario and delayed pushdown, This gives Spark AQE a chance to generate the most optimum plan and eliminating the pushdown of redundant queries. This results in improved performance as new filters identified at runtime by AQE are pushdown.

Furthermore, it logs the pushdown query into spark plan. This allow easy debugging from Spark History Server and UIs and from logs.

This PR adds new unit test suit for it.

urosstan-db commented 2 months ago

@jalpan-randeri Do we plan to merge this, this can fix following issue also https://github.com/snowflakedb/spark-snowflake/issues/567

urosstan-db commented 2 months ago

@sfc-gh-bli Do we plan to merge this PR?

jalpan-randeri commented 2 months ago

Yes, i plan to merge this. However I am waiting for review of this PR. Can you review it?

urosstan-db commented 2 months ago

Yes, i plan to merge this. However I am waiting for review of this PR. Can you review it?

Overall, it looks good, but I am not commiter, so you need approval from someone from snow

jalpan-randeri commented 2 months ago

@sfc-gh-bli please review and share your thoughts