Closed rasbt closed 6 months ago
We have one in https://arxiv.org/pdf/2401.02385.pdf Figure 2. Do note that two bugs were found during the run: https://whimsical-aphid-86d.notion.site/Release-of-TinyLlama-1-5T-Checkpoints-Postponed-01b266998c1c47f78f5ae1520196d194?pvs=4 and https://whimsical-aphid-86d.notion.site/Latest-Updates-from-TinyLlama-Team-7d30c01fff794da28ccc952f327c8d4f?pvs=4. So we may not draw conclusive result.
Thanks, I was a bit confused by this and thought this was something different. Figure 1 shows 3456 GPU hours for TinyLlama, which I assume is for 1 epoch? The 10^4 mark in Figure 2 would then correspond to ~3 epochs?
So sorry about the confusion in the report... We forget to cite the figure in this paragraph.
Thanks for sharing this awesome work (and the paper write-up)! I was wondering if you by chance have a plot similar to the one from the Pythia paper but for all 3 epochs. If so, that would be super interesting and intriguing.
@rasbt what's your takeaway when you consider the Pythia work combined with this TinyLlama work?
Thanks for clarifying @jzhang38 . If I understand it correctly now, is the following a correct assumption?
@rasbt what's your takeaway when you consider the Pythia work combined with this TinyLlama work?
It looks like there's definitely an improvement due to architecture changes and perhaps the dataset :)
@rasbt you did not respect the log scaling of the x-axis.
Thanks for sharing this awesome work (and the paper write-up)! I was wondering if you by chance have a plot similar to the one from the Pythia paper but for all 3 epochs. If so, that would be super interesting and intriguing.