hiyouga / LLaMA-Factory

Unified Efficient Fine-Tuning of 100+ LLMs (ACL 2024)
https://arxiv.org/abs/2403.13372
Apache License 2.0
34.3k stars 4.23k forks source link

Support for parquet datasets? #2467

Closed tjthejuggler closed 9 months ago

tjthejuggler commented 9 months ago

Reminder

Reproduction

I am trying to finetune with a parquet dataset like this one:

https://huggingface.co/datasets/MadVoyager/stable_diffusion_instructional_dataset

Expected behavior

when i put it into the /data folder i expected the dataset to show up in the list of datasets, but it didn't.

System Info

Others

Sorry if this is something simple or unrealistic or something!

hiyouga commented 9 months ago

Parquet is supported, see https://github.com/hiyouga/LLaMA-Factory?tab=readme-ov-file#data-preparation-optional

tjthejuggler commented 9 months ago

Thankyou!