Accoring to the Astra-sim 2.0 paper, simulates based on Chakra trace, to "decouple parallelization strategies from the ASTRAsim
implementation" . Does that mean we have to trace a real 10,000 GPUs AI training system before we can do simulation and analysis of the system in that scale?
Accoring to the Astra-sim 2.0 paper, simulates based on Chakra trace, to "decouple parallelization strategies from the ASTRAsim implementation" . Does that mean we have to trace a real 10,000 GPUs AI training system before we can do simulation and analysis of the system in that scale?
Thanks