google / ml-compiler-opt

Infrastructure for Machine Learning Guided Optimization (MLGO) in LLVM.
Apache License 2.0
610 stars 91 forks source link

Mimimum NVIDA chip for training #301

Open reedkotler opened 11 months ago

reedkotler commented 11 months ago

What kind of NVIDIA chip would I need for training?

TIA

mtrofin commented 11 months ago

We didn't use any kind of acceleration, but if you want to, I'd assume anything that Tensorflow supports.

boomanaiden154 commented 11 months ago

Anything that Tensorflow supports should work, but note (that at least with how the code is setup currently) that copying the data over to the GPU/doing the training iterations on the GPU/copying everything back over to the CPU takes longer than just doing the training on the CPU, at least when I last tested it. There has been some experimentation with much larger models where acceleration (GPU/TPU) has definitely helped, but none of those models are upstream in this repository.

reedkotler commented 11 months ago

So even then a modern mac m1 should work?

reedkotler commented 11 months ago

I see, just a modern CPU with 96 HW threads? "for local training, which is currently the only supported mode, we recommend a high-performance workstation (e.g. 96 hardware threads)."

mtrofin commented 11 months ago

Yup. The bottleneck is currently compile time.

reedkotler commented 11 months ago

Do you know if anyone is using a mac for this?

boomanaiden154 commented 11 months ago

There was some experimentation with using a Mac (see patches like https://github.com/google/ml-compiler-opt/pull/260), but it's not really a platform where all the tooling here is guaranteed to work. The tooling in this repository is almost exclusively developed and run on Linux, although running it on a Mac should theoretically work, maybe minus seem slight issues.