Closed brando90 closed 11 months ago
Did you use the full 4K context length of llama for training for each sample?
I see you have 395K examples and used 4K llama2, so an upper bound is 4K * 395k. Is it possible to get a more precise number on the number of tokens trained on?
Did you use the full 4K context length of llama for training for each sample?
I see you have 395K examples and used 4K llama2, so an upper bound is 4K * 395k. Is it possible to get a more precise number on the number of tokens trained on?