Closed ezerhouni closed 8 months ago
Can you try torch.min(inputs, dim=None)?
The error shows you need to specify the dim argument for torch.min(), though your code looks correct to me.
Same issue https://github.com/coqui-ai/TTS/issues/2555
It comes from the bad data file which doesn't align properly.
@ezerhouni
I suggest that you use https://github.com/rhasspy/piper-phonemize to convert text to tokens.
Otherwise, it may be difficult, if not impossible, to deploy the trained model with C++.
You can find pre-built wheels for Linux and Windows at https://github.com/csukuangfj/piper-phonemize/releases/tag/2023.12.5
@yaozengwei
Do you have any code to share about using piper-phonemizer to convert text to tokens?
@csukuangfj Let me try torch.min(inputs, dim=None)
I am trying the LJSpeech recipe for the moment with VITS-2
I am trying the LJSpeech recipe for the moment with VITS-2
Ok, but we are switching to piper-phonemize for converting text to tokens.
Hope that @yaozengwei can push the new tokenizer soon.
I am trying the LJSpeech recipe for the moment with VITS-2
Ok, but we are switching to piper-phonemize for converting text to tokens.
Hope that @yaozengwei can push the new tokenizer soon.
I just uploaded the code here https://github.com/k2-fsa/icefall/pull/1511.
@csukuangfj Now I am getting:
File "/vits2/egs/ljspeech/TTS/vits2/duration_predictor.py", line 191, in forward
z = flow(z, x_mask, g=x, inverse=inverse)
File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1518, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1527, in _call_impl
return forward_call(*args, **kwargs)
File "/vits2/egs/ljspeech/TTS/vits2/flow.py", line 297, in forward
xb, logdet_abs = piecewise_rational_quadratic_transform(
File "/vits2/egs/ljspeech/TTS/vits2/transform.py", line 38, in piecewise_rational_quadratic_transform
outputs, logabsdet = spline_fn(
File "/vits2/egs/ljspeech/TTS/vits2/transform.py", line 85, in unconstrained_rational_quadratic_spline
) = rational_quadratic_spline(
File "/vits2/egs/ljspeech/TTS/vits2/transform.py", line 175, in rational_quadratic_spline
assert (discriminant >= 0).all()
AssertionError
I will try with the new tokenizer to see if it fixes the issue
@yaozengwei Could you have a look at the above error?
Hello, I am trying to implement VITS2 but I am getting the following error :
File "/vits2/egs/ljspeech/TTS/vits2/transform.py", line 38, in piecewise_rational_quadratic_transform outputs, logabsdet = spline_fn( File "/vits2/egs/ljspeech/TTS/vits2/transform.py", line 85, in unconstrained_rational_quadratic_spline ) = rational_quadratic_spline( File "/vits2/egs/ljspeech/TTS/vits2/transform.py", line 118, in rational_quadratic_spline if torch.min(inputs) < left or torch.max(inputs) > right: RuntimeError: min(): Expected reduction dim to be specified for input.numel() == 0. Specify the reduction dim with the 'dim' argument.
Do you have an idea where it might come from ? I know that without code it is difficult to know, I will do a PR of the implementation later this week. Thank you
Seems the tensor inputs
for torch.min
is empty.
>>> import torch
>>> a = torch.empty((0,))
>>> torch.min(a)
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
RuntimeError: min(): Expected reduction dim to be specified for input.numel() == 0. Specify the reduction dim with the 'dim' argument.
An empty tensor will indeed throw the same error.
@csukuangfj I might have some good news but it needs a bit more testing. I will let you know next week
Unrelated to VITS-2 (please tell me if you prefer that I open a proper issue), it seems that for the VITS recipes, you are using spectrogram which is using Wav2Spec
while the loss is computed using Wav2LogFilterBank
is on purpose ?
hmm, i think we didn't choose this setup on purpose @yaozengwei am i right?
Unrelated to VITS-2 (please tell me if you prefer that I open a proper issue), it seems that for the VITS recipes, you are using spectrogram which is using
Wav2Spec
while the loss is computed usingWav2LogFilterBank
is on purpose ?
We just follows the VITS paper (https://arxiv.org/pdf/2106.06103.pdf), which uses linear spectrogram as input of the posterior encoder (Sec 2.1.3 and Fig.1), and uses mel-scale spectrograms to compute the reconstruction loss (Sec 2.1.2).
@yaozengwei Yes my bad, I misunderstood part of the code
Hello, I am trying to implement VITS2 but I am getting the following error :
Do you have an idea where it might come from ? I know that without code it is difficult to know, I will do a PR of the implementation later this week. Thank you