I think it would be good to see how the performance of TinyLlama & TinyLlama-chat evolve over the checkpoint.
We can have this through the HL leaderboard but it is quite long.
What would you suggest to use as a benchmark to compare TinyLlama between versions ?
I think it would be good to see how the performance of TinyLlama & TinyLlama-chat evolve over the checkpoint. We can have this through the HL leaderboard but it is quite long.
What would you suggest to use as a benchmark to compare TinyLlama between versions ?