Open leoplusx opened 1 year ago
Hi, happy to hear you're enjoying it!
You'd just have to change the data and the way you preprocess the it (you wouldn't want to do the prompting for example) and also switch the collator from a Seq2Seq to a LanguageModelling one. There's a good tutorial on hugging face for this available here: https://huggingface.co/docs/transformers/tasks/language_modeling#causal-language-modeling
Once you change the data and the preprocessing/collator then the rest is pretty much the same.
Good luck and feel free to reach out if you have more questions!
Hey, thanks for the repo!
Any ideas on how to use it to train with unlabelled data for causal language modelling? I want to adapt the foundation model to my domain first, before I do instruction fine-tuning.