Open mad-legion opened 2 years ago
...: import pandas as pd
...:
...: path_to_table = "d:/temp/delta/mytable"
...:
...: df = pd.DataFrame({'x': [1, 2, 3]})
...: deltalake.writer.write_deltalake(path_to_table, df)
...:
...: df = pd.DataFrame({'x': [4, 5, 6]})
...: deltalake.writer.write_deltalake(path_to_table, df, mode = 'append')
...:
...: dt = deltalake.DeltaTable(path_to_table)
...:
In [2]: df = dt.to_pandas()
In [3]: df
Out[3]:
x
0 1
1 2
2 3
3 4
4 5
5 6```
this generated two parquet files
05/24/2022 09:12 AM <DIR> .
05/24/2022 09:12 AM <DIR> ..
05/24/2022 09:12 AM 1,652 0-11e8473d-ac00-426a-b149-dce5445597bb-0.parquet
05/24/2022 09:12 AM 1,652 1-d06d5dbd-c0be-475c-b19a-641c59e4cc93-0.parquet
05/24/2022 09:12 AM <DIR> _delta_log
async fn test_lakehouse_query() -> Result<()> {
let _telemetry_guard = TelemetryGuard::default().unwrap();
let table_path = "d:/temp/cache/tables/3F5F22FF-445B-2156-96F6-3F8CA984968E/spans";
let table = deltalake::open_table(&table_path).await?;
let ctx = SessionContext::new();
ctx.register_table("spans", Arc::new(table))?;
let batches = ctx
.sql("SELECT count(*) FROM spans where begin_ms > 5000")
.await?
.collect()
.await?;
dbg!(batches);
Ok(())
}
in the directory, there are 1874 files for a total of 4.5 Gb. The test executes in 0.53 seconds... not bad (the answer is 149341791)
https://crates.io/crates/deltalake
https://github.com/delta-io/delta/blob/master/PROTOCOL.md