Closed GuillaumeHolley closed 2 years ago
@GuillaumeHolley ,
So this message:
A100-SXM4-40GB with CUDA capability sm_80 is not compatible with the current PyTorch installation.
The current PyTorch install supports CUDA capabilities sm_37 sm_50 sm_60 sm_70.
Is related to this: https://discuss.pytorch.org/t/pytorch-1-7-0-support-for-a100-gpus/108211
Can you please let me know the version of pytorch you have in your system?
Unfortunately this machine is rather busy and I couldn't get access to it again. I believe the link you provided is the solution so I will close this for now. Thanks for the help!
Hi hi,
So I am in the midst of training Pepper SNP on my HG002 corrected with Ratatosk. I am currently running step 4 which is the training step itself. I have a machine with 2 GPUs and it is running fine but given 1000 epochs, I estimate the total wall-clock time to be just under 3 days. I tried to accelerate this by running the same step on a DGX1 server with 8 GPUs and I get the following error:
Seems like the TensorFlow package doesn't handle this GPU architecture which seems weird to me.
Thanks, Guillaume