huggingface / datasets

🤗 The largest hub of ready-to-use datasets for ML models with fast, easy-to-use and efficient data manipulation tools
https://huggingface.co/docs/datasets
Apache License 2.0
18.81k stars 2.6k forks source link

Integrate Polars library #3334

Open albertvillanova opened 2 years ago

albertvillanova commented 2 years ago

Check potential integration of the Polars library: https://github.com/pola-rs/polars

CC: @thomwolf @lewtun

lewtun commented 2 years ago

If possible, a neat API could be something like Dataset.to_polars(), as well as Dataset.set_format("polars")

albertvillanova commented 2 years ago

Note they use a "custom" implementation of Arrow: Arrow2.

braaannigan commented 1 year ago

Polars has grown rapidly in popularity over the last year - could you consider integrating the Polars functionality again?

I don't think the "custom" implementation should be a barrier, it still conforms to the Arrow specification

amrit110 commented 10 months ago

Is there some direction regarding this from the HF team @lewtun ? Can conversion from polars to HF dataset be implemented with limited/zero copy? So, something like Dataset.from_polars() and Dataset.to_polars() like you mentioned. Happy to contribute if I can get some pointers on how this may be implemented.

fzyzcjy commented 4 months ago

Hi, is there any updates? Thanks!