huggingface / alignment-handbook

Robust recipes to align language models with human and AI preferences
https://huggingface.co/HuggingFaceH4
Apache License 2.0
4.18k stars 354 forks source link

Can we please add the option to work with a tokenized dataset, escpailly for the CPT task. #144

Open shamanez opened 3 months ago

shamanez commented 3 months ago

Since we have the CPT task now, it would be nice to have the ability to feel a tokenized and packed dataset directly.