iterative / datachain

AI-data warehouse to enrich, transform and analyze unstructured data
https://docs.datachain.ai
Apache License 2.0
2.03k stars 91 forks source link

Load from / to Hugging Face ? #236

Open lhoestq opened 3 months ago

lhoestq commented 3 months ago

Hi ! I'm Quentin from HF :)

Congrats on the release ! The API is concise and easy, it will be useful to many people

I was wondering if you had plans to support reading / writing from HF datasets ?

If you use fsspec it might work out of the box though, using hf:// paths (and if you have the huggingface_hub lib installed)

dmpetrov commented 3 months ago

@lhoestq thank you for the asking this! Do yo have a specific use case in mind? What exact dataset, and what you want to do with the dataset next?

Let us evaluate this. It seems straightforward and our other project DVC already supports this. We will get back soon.

lhoestq commented 3 months ago

Cool ! The main uses cases I imagine are transforming rows / generating more rows with a LLM of existing datasets

dtulga commented 1 month ago

This article may be helpful for a future structured export function: https://huggingface.co/docs/datasets/en/repository_structure