microsoft / lst-bench

LST-Bench is a framework that allows users to run benchmarks specifically designed for evaluating Log-Structured Tables (LSTs) such as Delta Lake, Apache Hudi, and Apache Iceberg.
Apache License 2.0
62 stars 34 forks source link

Introduce configurations and workflow automation necessary to execute LST-Bench on various systems #238

Closed jcamachor closed 6 months ago

jcamachor commented 6 months ago

Currently, the code base provides configurations for executing LST-Bench on Spark and Trino. However, these configurations are primarily offered as templates, more or less integrated within the codebase, and lack clear specifications regarding compatibility with specific engine and table format versions.

This issue aims to improve the reproducibility of LST-Bench executions. To achieve this, we will introduce configurations tailored for executing LST-Bench on various systems, explicitly tied to specific engine and table format versions. Additionally, we will provide workflow automation to execute LST-Bench on cloud infrastructure, initially focusing on Azure. Initially, we focus on the specific versions used in the LST-Bench paper (Spark 3.3.1 and Trino 420), but we plan to expand to other versions over time.