unixpickle / vq-voice-swap

Voice swapping with VQ-VAE and diffusion models
65 stars 14 forks source link

convert using pretrained fails #10

Open evolu8 opened 2 years ago

evolu8 commented 2 years ago

Get the following trying your quick 'try it yourself' code. I double checked, and my input.wav was more the 4 secs. Maybe if you upload an input wav that does work I could go from there and help troubleshoot?

RuntimeError: Sizes of tensors must match except in dimension 1. Expected size 324 but got size 325 for tensor number 1 in th
e list.
unixpickle commented 2 years ago

What is the stack trace of that error? Just want to make sure it's not something else.

unixpickle commented 2 years ago

Here is an example file that should work: photon.wav.gz

evolu8 commented 2 years ago

Thank you so much. Your file worked fine. Mine gives this:

(py37) phil@besast:~/code/vq-voice-swap$ python3 sample_vqvae.py --encoding ulaw --input-file input.wav --label 0 model_ema_0.99.pt output.wav loading model from checkpoint... loading waveform from input.wav... encoding audio sequence... decoding audio samples... 0it [00:00, ?it/s]Traceback (most recent call last): File "sample_vqvae.py", line 96, in <module> main() File "sample_vqvae.py", line 55, in main enc_pred_scale=args.enc_pred_scale, File "/home/phil/code/vq-voice-swap/vq_voice_swap/vq_vae.py", line 144, in decode **kwargs, File "/home/phil/code/vq-voice-swap/vq_voice_swap/diffusion/diffusion.py", line 121, in ddpm_sample eps = predictor(x_t, ts) File "/home/phil/code/vq-voice-swap/vq_voice_swap/vq_vae.py", line 138, in <lambda> xs, ts, cond=cond_seq, labels=labels, **kwargs File "/home/phil/miniconda3/envs/py37/lib/python3.7/site-packages/torch/nn/modules/module.py", line 1102, in _call_impl return forward_call(*input, **kwargs) File "/home/phil/code/vq-voice-swap/vq_voice_swap/models/unet.py", line 156, in forward h = torch.cat([h, skips.pop()], axis=1) RuntimeError: Sizes of tensors must match except in dimension 1. Expected size 324 but got size 325 for tensor number 1 in the list.