Image GPT-L compute - Githubissues

Jsevillamol commented 3 years ago

Image GPT-L compute is 248 PF-days, same as Image GPT-XL.

I find it implausible that the amount of compute is the same as the larger model.

Here is an alternate (not 100% reliable) source which estimates 6.5e21 FLOPS total https://www.lesswrong.com/posts/wfpdejMWog4vEDLDg/ai-and-compute-trend-isn-t-predictive-of-what-is-happening

iacolippo commented 2 years ago

Thank you for the pointer (and sorry for the late reply). 1 PF-day is 32 V100 at typical efficiency for a day, so I get 78 PF-days for Image GPT-L using the numbers from your source. Do you agree?

Jsevillamol commented 2 years ago

Let me do this again on my own to double check:

We have that "iGPT-L was trained for roughly 2500 V100-days" (source)

I assume this is the NVIDIA Tesla V100 GPU. In the specifications, the NVIDIA Tesla V100 has 7 to 8.2 TFLOPS of peak double precision performance and 14 to 16.4 TFLOPS of peak single precision performance and 112 to 130 TFLOPS of peak tensor performance (source).

I suppose the one that makes sense using if peak tensor performance, for ~125 TFLOPS peak tensor performance more or less. Following OpenAIs AI and compute we apply a 0.33 utitilization factor (source).

In total we get 2500 V100-days * (24*60*60) seconds/day * 125 TFLOPS * 0.33 = 8.91e+21 FLOPS = 89.1 PF-days.

So I think I agree with your estimate.

Note that this is different from the estimate currently in the repository.

Also, where did you find the data for the iGPT-XL version?

iacolippo commented 2 years ago

Yes, we will update this together with all the recent models soon. @slippylolo found these numbers and might be able to say more about the source.

lightonai / akronomicon

Image GPT-L compute #3