Closed cliuxinxin closed 2 years ago
What is your n_speakers
parameter set to?
Check that the number if high enough for your dataset.
@CookiePPP
Thank you for your valuable advice.
I have three voices in my data.
I'm using the ids 1,2,3
And then I wrote 3 in n_speaker
According to your suggestion, I'll change it to 5
It started to work.
Hello:
Thank you so much for your great work.
When I trained the one-man model, everything was fine. But when I was training multiplayer models. An error is displayed.
===============================================
./aten/src/ATen/native/cuda/Indexing.cu:1088: indexSelectSmallIndex: block: [0,0,0], thread: [18,0,0] Assertion
main()
File "train_ms.py", line 50, in main
mp.spawn(run, nprocs=n_gpus, args=(n_gpus, hps,))
File "/root/liuxinxin/v_env/lib/python3.8/site-packages/torch/multiprocessing/spawn.py", line 240, in spawn
return start_processes(fn, args, nprocs, join, daemon, start_method='spawn')
File "/root/liuxinxin/v_env/lib/python3.8/site-packages/torch/multiprocessing/spawn.py", line 198, in start_processes
while not context.join():
File "/root/liuxinxin/v_env/lib/python3.8/site-packages/torch/multiprocessing/spawn.py", line 160, in join
raise ProcessRaisedException(msg, error_index, failed_process.pid)
torch.multiprocessing.spawn.ProcessRaisedException:
srcIndex < srcSelectDimSize
failed. ../aten/src/ATen/native/cuda/Indexing.cu:1088: indexSelectSmallIndex: block: [0,0,0], thread: [19,0,0] AssertionsrcIndex < srcSelectDimSize
failed. ../aten/src/ATen/native/cuda/Indexing.cu:1088: indexSelectSmallIndex: block: [0,0,0], thread: [20,0,0] AssertionsrcIndex < srcSelectDimSize
failed. ../aten/src/ATen/native/cuda/Indexing.cu:1088: indexSelectSmallIndex: block: [0,0,0], thread: [21,0,0] AssertionsrcIndex < srcSelectDimSize
failed. ../aten/src/ATen/native/cuda/Indexing.cu:1088: indexSelectSmallIndex: block: [0,0,0], thread: [22,0,0] AssertionsrcIndex < srcSelectDimSize
failed. ../aten/src/ATen/native/cuda/Indexing.cu:1088: indexSelectSmallIndex: block: [0,0,0], thread: [23,0,0] AssertionsrcIndex < srcSelectDimSize
failed. ../aten/src/ATen/native/cuda/Indexing.cu:1088: indexSelectSmallIndex: block: [0,0,0], thread: [24,0,0] AssertionsrcIndex < srcSelectDimSize
failed. ../aten/src/ATen/native/cuda/Indexing.cu:1088: indexSelectSmallIndex: block: [0,0,0], thread: [25,0,0] AssertionsrcIndex < srcSelectDimSize
failed. ../aten/src/ATen/native/cuda/Indexing.cu:1088: indexSelectSmallIndex: block: [0,0,0], thread: [26,0,0] AssertionsrcIndex < srcSelectDimSize
failed. ../aten/src/ATen/native/cuda/Indexing.cu:1088: indexSelectSmallIndex: block: [0,0,0], thread: [27,0,0] AssertionsrcIndex < srcSelectDimSize
failed. ../aten/src/ATen/native/cuda/Indexing.cu:1088: indexSelectSmallIndex: block: [0,0,0], thread: [28,0,0] AssertionsrcIndex < srcSelectDimSize
failed. ../aten/src/ATen/native/cuda/Indexing.cu:1088: indexSelectSmallIndex: block: [0,0,0], thread: [29,0,0] AssertionsrcIndex < srcSelectDimSize
failed. ../aten/src/ATen/native/cuda/Indexing.cu:1088: indexSelectSmallIndex: block: [0,0,0], thread: [30,0,0] AssertionsrcIndex < srcSelectDimSize
failed. ../aten/src/ATen/native/cuda/Indexing.cu:1088: indexSelectSmallIndex: block: [0,0,0], thread: [31,0,0] AssertionsrcIndex < srcSelectDimSize
failed. Traceback (most recent call last): File "train_ms.py", line 296, in-- Process 0 terminated with the following error: Traceback (most recent call last): File "/root/liuxinxin/v_env/lib/python3.8/site-packages/torch/multiprocessing/spawn.py", line 69, in _wrap fn(i, args) File "/root/liuxinxin/vits/train_ms.py", line 120, in run train_and_evaluate(rank, epoch, hps, [net_g, net_d], [optim_g, optim_d], [scheduler_g, scheduler_d], scaler, [train_loader, eval_loader], logger, [writer, writer_eval]) File "/root/liuxinxin/vits/train_ms.py", line 148, in train_and_evaluate (z, z_p, m_p, logs_p, m_q, logs_q) = net_g(x, x_lengths, spec, spec_lengths, speakers) File "/root/liuxinxin/v_env/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1190, in _call_impl return forward_call(input, kwargs) File "/root/liuxinxin/v_env/lib/python3.8/site-packages/torch/nn/parallel/distributed.py", line 1040, in forward output = self._run_ddp_forward(*inputs, *kwargs) File "/root/liuxinxin/v_env/lib/python3.8/site-packages/torch/nn/parallel/distributed.py", line 1000, in _run_ddp_forward return module_to_run(inputs[0], kwargs[0]) File "/root/liuxinxin/v_env/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1190, in _call_impl return forward_call(*input, *kwargs) File "/root/liuxinxin/vits/models.py", line 467, in forward z, m_q, logs_q, y_mask = self.enc_q(y, y_lengths, g=g) File "/root/liuxinxin/v_env/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1190, in _call_impl return forward_call(input, kwargs) File "/root/liuxinxin/vits/models.py", line 236, in forward x = self.pre(x) x_mask File "/root/liuxinxin/v_env/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1190, in _call_impl return forward_call(input, kwargs) File "/root/liuxinxin/v_env/lib/python3.8/site-packages/torch/nn/modules/conv.py", line 313, in forward return self._conv_forward(input, self.weight, self.bias) File "/root/liuxinxin/v_env/lib/python3.8/site-packages/torch/nn/modules/conv.py", line 309, in _conv_forward return F.conv1d(input, weight, bias, self.stride, RuntimeError: cuDNN error: CUDNN_STATUS_INTERNAL_ERROR You can try to repro this exception using the following code snippet. If that doesn't trigger the error, please include your original repro script when reporting this issue.
import torch torch.backends.cuda.matmul.allow_tf32 = False torch.backends.cudnn.benchmark = True torch.backends.cudnn.deterministic = False torch.backends.cudnn.allow_tf32 = True data = torch.randn([1, 513, 1, 526], dtype=torch.half, device='cuda', requires_grad=True) net = torch.nn.Conv2d(513, 192, kernel_size=[1, 1], padding=[0, 0], stride=[1, 1], dilation=[1, 1], groups=1) net = net.cuda().half() out = net(data) out.backward(torch.randn_like(out)) torch.cuda.synchronize()
ConvolutionParams memory_format = Contiguous data_type = CUDNN_DATA_HALF padding = [0, 0, 0] stride = [1, 1, 0] dilation = [1, 1, 0] groups = 1 deterministic = false allow_tf32 = true input: TensorDescriptor 0xa6494bc0 type = CUDNN_DATA_HALF nbDims = 4 dimA = 1, 513, 1, 526, strideA = 269838, 526, 526, 1, output: TensorDescriptor 0xa6495e50 type = CUDNN_DATA_HALF nbDims = 4 dimA = 1, 192, 1, 526, strideA = 100992, 526, 526, 1, weight: FilterDescriptor 0xa648fa90 type = CUDNN_DATA_HALF tensor_format = CUDNN_TENSOR_NCHW nbDims = 4 dimA = 192, 513, 1, 1, Pointer addresses: input: 0x7f4112800000 output: 0x7f41129b4a00 weight: 0x7f4112984800
======================================
I only have a 3090GPU
Any relevant advice would be greatly appreciated.
I have tried:
It doesn't work