Closed xyyintel closed 3 months ago
We never got to test this on H100 I think. cc @janekl if you've tried on H100.
Right, the development and testing involved only A100.
To achieve this at least you would need CUDA 12 and compile FBGEMM for Hopper architecture (SM90). But I have never tried this myself.
Closing as the reference was not tested on H100s Note that there were multiple H100 DLRMv2 submissions in the MLPerf Training v4.0 round as shown in the results table.
Training v4.0 implementations are in this repo
Does DLRM_v2 support H100? If supported, what is the env you used? I have tried cuda11.8 + pytorch 1.14.0 or pytorch 2.1 + torchrec 0.3.2 or torchrec 0.4.0 + fbgemm_gpu 0.3.2 or 0.4.1. However, none of above env works.