Speaking formally, the shape of variable y_cut_mask from here, might not match the shape of variable y_cut at the last dimension (which is out_size for y_cut).
To comprehend, take a look at the function sequence_mask, which we invoke to create y_cut_mask. As parameter max_length is not provided, the length dimension will be of size max(length) (look here). Thus, if all sequences in a batch, provided to GradTTS.forward(...) are shorter than out_size, the last dimension of the shape of y_cut_mask will not match the last dimension of y_cut.
An easy experiment can show up an issue. Start training GradTTS with batch_size==1. In that case if there is any sequence shorter than out_size, training will fail with shape mismatch.
The fix I suggest is elementary: provide parameter max_length=out_size when calling sequence_maskhere.
Moreover, we better skip cropping out mel when all sequences in a batch, provided to GradTTS.forward(...) are shorter than out_size. Concrete, I suggest to add condition y_max_length > out_sizehere.
Speaking formally, the shape of variable
y_cut_mask
from here, might not match the shape of variabley_cut
at the last dimension (which isout_size
fory_cut
). To comprehend, take a look at the functionsequence_mask
, which we invoke to createy_cut_mask
. As parametermax_length
is not provided, the length dimension will be of sizemax(length)
(look here). Thus, if all sequences in a batch, provided toGradTTS.forward(...)
are shorter thanout_size
, the last dimension of the shape ofy_cut_mask
will not match the last dimension ofy_cut
. An easy experiment can show up an issue. Start training GradTTS withbatch_size==1
. In that case if there is any sequence shorter thanout_size
, training will fail with shape mismatch. The fix I suggest is elementary: provide parametermax_length=out_size
when callingsequence_mask
here. Moreover, we better skip cropping out mel when all sequences in a batch, provided toGradTTS.forward(...)
are shorter thanout_size
. Concrete, I suggest to add conditiony_max_length > out_size
here.