lancedb / lance

Modern columnar data format for ML and LLMs implemented in Rust. Convert from parquet in 2 lines of code for 100x faster random access, vector index, and data versioning. Compatible with Pandas, DuckDB, Polars, Pyarrow, with more integrations coming..
https://lancedb.github.io/lance/
Apache License 2.0
3.87k stars 213 forks source link

feat: null support for plain encoding #1643

Open rok opened 11 months ago

rok commented 11 months ago

Subtask of #16.

Create proposal for nullability implementation with a mini batch binary mask.

westonpace commented 9 months ago

@wjones127 to answer your question regarding structs & nullability. I agree that we should not try and store nulls for structs. A null struct should become a struct of nulls. This matches our current I/O most closely and so it is the easier change. If we have a need for it we can add that in the future.

dnsco commented 9 months ago

Weighing in as a user, being able to write queries in duckdb as "where some_struct is not null" is really nice. It is different than "where some_struct.some_field is not null".

I have no idea about the implementation implications on your side, but want to throw out that the former is really really nice.

mkleinbort commented 7 months ago

Very keen on this feature, we'd like to move to lance but this is a deal breaker for our data infrastructure

westonpace commented 7 months ago

Very keen on this feature, we'd like to move to lance but this is a deal breaker for our data infrastructure

It's in progress. Right now the plan is for it to be delivered as part of https://github.com/lancedb/lance/issues/1929 . I've been focused on https://github.com/lancedb/lancedb/issues/926 recently but should be finishing up this week and getting some time to spend on Lance v2 again.