Open ctoth opened 1 year ago
Hey @ctoth , thank you, glad it's useful!
The only thing i noticed about your training params is the lack of auto_wrap
option in the --fsdp
argument. (see godot_dodo_4x_60k_starcoder_15b_3ep , Transformers docs)
Could you try adding that and report back? The rest all looks correct to me, and i was using the same hardware for my training runs.
I really appreciate you releasing this work. I have been trying to do something similar with the original Starcoder finetuning code but have had a variety of issues. Unfortunately, when I run this script on my own dataset (it's only around 6800 MOO verbs) I get a pretty rapid OOM on a machine with 8x A100 80gb cards. At first I thought it was because I was trying to increase max_seq_size, (I was hoping for 1024 tokens) but dropping it back to 512 gave me the same issue. I then tried reducing batch size to 1, but that also did not work and errored out with insufficient memory again. The only other thing I changed is the prompt, although I made very minor changes to that, mostly just changing the language to my own and picking different columns out of my dataset.
Here is my run.sh:
Any idea what might be going wrong here/can I give you any more info to help me figure this out?