Open Jsevillamol opened 3 years ago
Thank you for the pointer (and sorry for the late reply). 1 PF-day is 32 V100 at typical efficiency for a day, so I get 78 PF-days for Image GPT-L using the numbers from your source. Do you agree?
Let me do this again on my own to double check:
We have that "iGPT-L was trained for roughly 2500 V100-days" (source)
I assume this is the NVIDIA Tesla V100 GPU. In the specifications, the NVIDIA Tesla V100 has 7 to 8.2 TFLOPS of peak double precision performance and 14 to 16.4 TFLOPS of peak single precision performance and 112 to 130 TFLOPS of peak tensor performance (source).
I suppose the one that makes sense using if peak tensor performance, for ~125 TFLOPS peak tensor performance more or less. Following OpenAIs AI and compute we apply a 0.33 utitilization factor (source).
In total we get 2500 V100-days * (24*60*60) seconds/day * 125 TFLOPS * 0.33 = 8.91e+21 FLOPS = 89.1 PF-days
.
So I think I agree with your estimate.
Note that this is different from the estimate currently in the repository.
Also, where did you find the data for the iGPT-XL version?
Yes, we will update this together with all the recent models soon. @slippylolo found these numbers and might be able to say more about the source.
Image GPT-L compute is 248 PF-days, same as Image GPT-XL.
I find it implausible that the amount of compute is the same as the larger model.
Here is an alternate (not 100% reliable) source which estimates 6.5e21 FLOPS total https://www.lesswrong.com/posts/wfpdejMWog4vEDLDg/ai-and-compute-trend-isn-t-predictive-of-what-is-happening