jaywalnut310 / vits

VITS: Conditional Variational Autoencoder with Adversarial Learning for End-to-End Text-to-Speech
https://jaywalnut310.github.io/vits-demo/index.html
MIT License
6.91k stars 1.27k forks source link

Can't train with fp16 on Nvidia RTX3060 #45

Open thangnvkcn opened 2 years ago

thangnvkcn commented 2 years ago

training with fp16 doesn't work for me on a RTX3060, I'll look into fixing it, but for future reference here is the full stacktrace torch version 1.9.0

INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for 1 nodes. THCudaCheck FAIL file=/pytorch/aten/src/THC/THCCachingHostAllocator.cpp line=278 error=710 : device-side assert triggered terminate called after throwing an instance of 'std::runtime_error' what(): NCCL error in: /pytorch/torch/lib/c10d/../c10d/NCCLUtils.hpp:158, unhandled cuda error, NCCL version 2.7.8 ncclUnhandledCudaError: Call to CUDA function failed.

RuntimeError: CUDA error: device-side assert triggered CUDA kernel errors might be asynchronously reported at some other API call,so the stacktrace below might be incorrect. For debugging consider passing CUDA_LAUNCH_BLOCKING=1.

nikich340 commented 2 years ago

maybe problem in too new torch version, have you tried 1.8.1+cu111 ?