Open fabiocat93 opened 11 months ago
I'm running into this issue too
I think the error might be that your embedding is larger than he positional encoding
I am running into this issue too. If I change the input audio, but keep the embedding the same, this issue is gone. So it is related to the input audio, rather than the embedding. Did anyone fix it?
I am working on the voice conversion tutorial (https://huggingface.co/blog/speecht5) to convert some audio input into a target voice and everything is fine. Next, I try the code on my data. They are all mono, 16khz, 16bit. Most of them work fine, but for some of them I get the following error:
Has anybody face anything similar?