microsoft / lst-bench

LST-Bench is a framework that allows users to run benchmarks specifically designed for evaluating Log-Structured Tables (LSTs) such as Delta Lake, Apache Hudi, and Apache Iceberg.
Apache License 2.0
66 stars 34 forks source link

New data cleaning scenario #156

Open jcamachor opened 1 year ago

jcamachor commented 1 year ago

An interesting data cleaning scenario is presented in this tutorial: Using Trino and Iceberg for data warehousing. The scenario relies on the NYC Taxi dataset and is implemented using Iceberg and the Trino engine. Thus, it seems it is well-suited to be included in LST-Bench, possibly extending it to other table formats and engines.