Closed ljch2018 closed 10 months ago
1.The fraction of GPT4 (X-axis) may represents the size of the student model, '1' represents GPT4, 10^-2 represents GPT4 parameters * 0.01 and so on.
zidong is correct except that 10^(-2) doesn't necessarily mean parameters * 0.01, it means less parameters and trained for 0.01x the compute
I read the original paper, but met some problems about the graph.
Thank you very much.