Closed borgr closed 11 months ago
Yeah, you can compute them using FLOPS = 6 * N * D
where N is num params & D is num total tokens; The formula is also in the paper
Thanks, I actually saw that somewhere in the code\documentation but wasn't sure it always holds (It doesn't hold in general)
In the contour there are many sizes of models, is there a place that says how many flops each of those used? (in the other it is always 3 sizes so the one table that specifies it is enough). Or is there some easy formula you can deduce flops from parameters? Sorry for bothering you... You've got some really useful data lying around here.