Closed Stillerman closed 1 year ago
I've read this and it seems okay. But, someone should actually run it on a GPU.
Switched from character-based FIM to token-based. Now we do not have to do expensive encode/decode and it is not slowed down at all.
FIM time: 37.271945165999995
Non-FIM time: 37.641369552
Will test on GPU soon
Finetuned for 48 hours on an A100 on the ruby stack and got this.
MultiPL-E pass@1 was 0.1
I addressed all of the comments in that last commit. The only change really of note is that I removed the eval_no_fim
instead of making that an additional eval set, as that was left over from testing. Is there some reason why we would want to add it as an additional eval set?
Also, reran a toy finetune for an hour and everything seems to be working same as it did before.
Finetuned Julia for 2m tokens both with and without FIM and tested both with MultiPL-E.
With FIM pass@1 = 0.1
Without FIM pass@1 = 0.09
Looks great! Thanks for working on this and for your patience!
Here is a draft that adds optional FIM permutations while finetuning. Some things I was hoping to get feedback on
I added the permuting as part of the
ConstantLengthDataset
class. Is that the right place to do this?With FIM enabled, batches take ~2x as long to load. Is this acceptable? Here are the stats for loading 5 batches with and without FIM on my Intel MacBook Pro
I cannot actually test that the training still works on the script because I don't have access to a GPU. The training seems to work in the Colab I adapted but this should be checked to make sure all is still well