huggingface / datablations

Scaling Data-Constrained Language Models
https://arxiv.org/abs/2305.16264
Apache License 2.0
321 stars 20 forks source link

Flop contour #11

Closed borgr closed 11 months ago

borgr commented 11 months ago

In the contour there are many sizes of models, is there a place that says how many flops each of those used? (in the other it is always 3 sizes so the one table that specifies it is enough). Or is there some easy formula you can deduce flops from parameters? Sorry for bothering you... You've got some really useful data lying around here.

Muennighoff commented 11 months ago

Yeah, you can compute them using FLOPS = 6 * N * D where N is num params & D is num total tokens; The formula is also in the paper

borgr commented 11 months ago

Thanks, I actually saw that somewhere in the code\documentation but wasn't sure it always holds (It doesn't hold in general)