Open wjones127 opened 3 months ago
To implement this, first we would need to expose an API like this:
import lance
dataset = lance.dataset("test")
lance.sql("SELECT * FROM table", table=dataset).to_table()
That should be straightfoward since we already have a table provider.
Then, we would need:
DistinctIndexScan
DistinctIndexScan
for indexed data
Right now, the recomended way to get distinct values of columns is to use duckdb:
However, if we have scalar indices on that column, we could execute the query much more quickly. We likely couldn't do that through the DuckDB integration, but we could do it within a DataFusion query easily.