lancedb / lance

Modern columnar data format for ML and LLMs implemented in Rust. Convert from parquet in 2 lines of code for 100x faster random access, vector index, and data versioning. Compatible with Pandas, DuckDB, Polars, Pyarrow, and PyTorch with more integrations coming..
https://lancedb.github.io/lance/
Apache License 2.0
3.92k stars 216 forks source link

[Python] Add video extension type #1384

Open rok opened 1 year ago

rok commented 1 year ago

After adding ImageURIArray, EncodedImageArray and FixedShapeImageTensorArray it is straightforward to add analogous types for video. Namely VideoURIArray, VideoEncodedArray and FixedShapeVideoTensorArray array. For decoder see TFs decode_webp.

rok commented 10 months ago

@wjones127 not sure if this should be closed.

tonyf commented 3 months ago

Are there any docs on how to write these types to a lance dataset? Specifically I'm trying to create a video column that's some sort of image array type.

I'm experimenting with doing this instead of just storing the video as bytes to save on decoding time in my training loop.

wjones127 commented 3 months ago

Hi @tonyf. In general, you can write an Apache Arrow extension array, and these can be written and read from Lance. A good reference for this would be Rok's changes for the image extension types:

https://github.com/lancedb/lance/pull/1272/files