jondurbin / bagel

A bagel, with everything.
308 stars 31 forks source link

TrainingArguments should be initialized before from_pretrain call #6

Closed tenggyut closed 8 months ago

tenggyut commented 8 months ago

TrainingArguments should be initialized before from_pretrain call. otherwise zero3_init_flag will be ignored when using deepspeed to initialize larger models.

jondurbin commented 8 months ago

Unless I'm missing something, it is?

Arguments parsed here: https://github.com/jondurbin/bagel/blob/bcac409bc61491f089ab86f8a1e4463b5ff7e86c/bagel/tune/sft.py#L1119

Model is loaded: https://github.com/jondurbin/bagel/blob/bcac409bc61491f089ab86f8a1e4463b5ff7e86c/bagel/tune/sft.py#L1137

tenggyut commented 8 months ago

dpo.py initialize model before the training_args

jon-convai commented 8 months ago

dpo.py initialize model before the training_args

Ah, I see - thanks, should be good now.