Closed DiDimus closed 3 years ago
HI @DiDimus , please try to specify an index of GPUs if you have several of in your computer by CUDA_VISIBLE_DEVICES=0
for the first GPU as an example.
Thanks. but result is exactly the same. I think the problem is in the software environment. Do you have docker for this project? Wich OS do you use?
Gotcha, you can refer to this: https://github.com/keonlee9420/Daft-Exprt/blob/main/Dockerfile
I think the Dockerfile should also work for this project. Please try it out and let me know the result.
This is obviously project code error with predicted tensor size:
With duration_control = 0.3 here RuntimeError: The size of tensor a (25) must match the size of tensor b (31) x shape is torch.Size([1, 31, 256]) ; mask shape is torch.Size([1, 25]) Right value is 31 (104*0.3)
With duration_control = 0.5 here RuntimeError: The size of tensor a (47) must match the size of tensor b (52) x shape is torch.Size([1, 52, 256]) ; mask shape is torch.Size([1, 47]) Right value is 52 (104*0.5)
With duration_control = 1.0 all OK x shape is torch.Size([1, 104, 256]) ; mask shape is torch.Size([1, 104])
With duration_control = 2.0 all OK x shape is torch.Size([1, 208, 256]) ; mask shape is torch.Size([1, 208])
yes, @Vadim2S . Problem found, thanks. Docker from Daft-Export didn't help :(
hey guys, I just found that you had issue with the control value lower than 1. sorry for the late correction, and thanks to @Vadim2S , I can confirm that there is an error in current code. I'll fix it and push soon. thank you all for the report!
Temporal workaround:
/model/modules.py #177 class LengthRegulator(nn.Module):
change
if max_len is not None:
output = pad(output, max_len)
else:
output = pad(output)
to:
if max_len is not None:
output = pad(output, max_len)
#VVS
mel_len.clear()
mel_len.append(output.shape[1])
else:
output = pad(output)
P.S. Duration prediction is real and in LengthRegulator.expand you do
for i, vec in enumerate(batch):
expand_size = predicted[i].item()
out.append(vec.expand(max(int(expand_size), 0), -1))
out = torch.cat(out, 0)
of course, you get out smaller than max_len due rounding. I am presume you must extend out to max_size later
I fixed the code and it's working now. The problem was originated from the value of max_len
at inference time in VarianceAdaptor
, where it should be 'None' but that of a reference audio was wrongly passed.
Thanks! Tested. Low duration work OK!
Hi I try to run your project. I use cuda 10.1, all requirements are installed (with torch 1.8.1), all models are preloaded. But i have an error:
python3 synthesize.py --text "Hello world" --restore_step 200000 --mode single -p config/LibriTTS/preprocess.yaml -m config/LibriTTS/model.yaml -t config/LibriTTS/train.yaml --duration_control 0.8 --energy_control 0.8 --ref_audio ref.wav