Closed mkleinbort-ic closed 4 months ago
@westonpace is working on null support for plain encoder currently. I would expect this to land in a week or so. @westonpace is there extra work required to support nulls in list types?
:cold_sweat: I don't know about a week or so. I hope the encoders and MVP version of the v2 file writer will land in a week or so. However, I think there is still some work to go before everything percolates up to the top-level APIs (need to integrate the new format with the scanner, etc.) Maybe the end of the month is more realistic for when users can start using these features.
@westonpace is there extra work required to support nulls in list types?
From the user perspective or from a development perspective?
Users shouldn't have to do anything. Once they upgrade Lance to the appropriate version it should just support writing nulls (any old files written with the old format will still read nulls back as empty lists, there is no way to recover them).
https://github.com/lancedb/lance/issues/1929 is the tracking issue for the new format version
Thank you both, I'll keep a close eye on this. Keen to migrate to lance, pending this fix.
How is this coming along? I see there is a lot to do in the writer V2 issue.
Do you know an estimate for this feature - about to kick off some refactoring next month and would love to move to lance as part of it - but waiting on this at the moment.
The V2 format is in beta right now. I think if you want nullability it's a good time to try it out and migrate. More compressive encodings are coming soon.
I don't think this is working at the moment (0.12.1):
import polars as pl
import lance
df_test_before = pl.DataFrame({
'x': [None, [1,2,3], []]
})
lance.write_dataset(df_test_before, 'df_test.lance', mode='overwrite', use_legacy_format=False)
>>> PanicException: not yet implemented: Implement encoding for field Field(id=0, name=x, type=large_list, children=[Field(id=1, name=item, type=int64), ])
Hmm it might just be that we have it for list (what PyArrow defaults to) and not large list (what Polars defaults to). We should probably implement large list as well.
This seems to be fixed - closing the issue.
Writing a table with a column of type
list[int]
containing nulls results in thenulls
being filled in with[]