Open xyyintel opened 1 year ago
We never got to test this on H100 I think. cc @janekl if you've tried on H100.
Right, the development and testing involved only A100.
To achieve this at least you would need CUDA 12 and compile FBGEMM for Hopper architecture (SM90). But I have never tried this myself.
Does DLRM_v2 support H100? If supported, what is the env you used? I have tried cuda11.8 + pytorch 1.14.0 or pytorch 2.1 + torchrec 0.3.2 or torchrec 0.4.0 + fbgemm_gpu 0.3.2 or 0.4.1. However, none of above env works.