lancedb / lance

Modern columnar data format for ML and LLMs implemented in Rust. Convert from parquet in 2 lines of code for 100x faster random access, vector index, and data versioning. Compatible with Pandas, DuckDB, Polars, Pyarrow, with more integrations coming..
https://lancedb.github.io/lance/
Apache License 2.0
3.86k stars 213 forks source link

Tracking Lance Datasets (DNC) #2086

Open tanaymeh opened 7 months ago

tanaymeh commented 7 months ago

This issue tracks the progress of Deep learning datasets created using Lance

If you would like to contribute to making these datasets, please reply to this issue with the dataset you wish to make into a Lance dataset and we would be happy to help you!

I8dNLo commented 7 months ago

Hi! I would like to help. Could you provide some more detailed guidance of it? Like reference for a good solution for other dataset or so. I would like to cover Imagenet for example

tanaymeh commented 7 months ago

@I8dNLo Thanks for showing an interest in expanding the Lance datasets family!

To know more about how to write a Lance dataset, check out the Lance Documentation for Reading and Writing Dataset. Since you showed an interest in Imagenet, here's a doc page that might come in handy for you - ImageURI.

I am currently in the process of creating documentation examples for making an Image dataset using Lance but feel free to open a PR if you beat me to it (ha!).

If, however, you want to work with Text datasets, we have a couple of examples of how to do that. Check these out - Lance Examples.

Let me know if you need anything else from us!