haoheliu / versatile_audio_super_resolution

Versatile audio super resolution (any -> 48kHz) with AudioSR.
MIT License
1.08k stars 106 forks source link

RuntimeError: Sizes of tensors must match except in dimension 1. Expected size 64 but got size 63 for tensor number 1 in the list. #37

Open vb87 opened 10 months ago

vb87 commented 10 months ago

I'm using the latest master and running on CUDA.

Here's the mp3 file I'm using as input: https://drive.google.com/file/d/1xR2mV-SctUknIvjKqlTYyFKHRl5annCX/view?usp=sharing

command line: python -m audiosr -i 5.01_22303.037073170733_23517.438009756097.mp3 -s . -d cuda

getting this error:

Loading AudioSR: speech Loading model on cuda D:\Soft\Python\Python38\lib\site-packages\torch\functional.py:504: UserWarning: torch.meshgrid: in an upcoming release, it will be required to pass the indexing argument. (Triggered internally at C:\actions-runner_work\pytorch\pytorch\builder\windows\pytorch\aten\src\ATen\native\TensorShape.cpp:3484.) return _VF.meshgrid(tensors, kwargs) # type: ignore[attr-defined] D:\Soft\Python\Python38\lib\site-packages\torchaudio\transforms_transforms.py:611: UserWarning: Argument 'onesided' has been deprecated and has no influence on the behavior of this module. warnings.warn( DiffusionWrapper has 258.20 M params. Running DDIM Sampling with 50 timesteps DDIM Sampler: 0%| | 0/50 [00:05<?, ?it/s] Traceback (most recent call last): File "D:\Soft\Python\Python38\lib\runpy.py", line 194, in _run_module_as_main return _run_code(code, main_globals, None, File "D:\Soft\Python\Python38\lib\runpy.py", line 87, in _run_code exec(code, run_globals) File "D:\Soft\audio_super_resolution\versatile_audio_super_resolution\audiosr__main__.py", line 115, in waveform = super_resolution( File "D:\Soft\audio_super_resolution\versatile_audio_super_resolution\audiosr\pipeline.py", line 168, in super_resolution waveform = latent_diffusion.generate_batch( File "D:\Soft\Python\Python38\lib\site-packages\torch\utils_contextlib.py", line 115, in decorate_context return func(*args, *kwargs) File "D:\Soft\audio_super_resolution\versatile_audio_super_resolution\audiosr\latent_diffusion\models\ddpm.py", line 1525, in generatebatch samples, = self.sample_log( File "D:\Soft\Python\Python38\lib\site-packages\torch\utils_contextlib.py", line 115, in decorate_context return func(args, kwargs) File "D:\Soft\audio_super_resolution\versatile_audio_super_resolution\audiosr\latent_diffusion\models\ddpm.py", line 1431, in sample_log samples, intermediates = ddim_sampler.sample( File "D:\Soft\Python\Python38\lib\site-packages\torch\utils_contextlib.py", line 115, in decorate_context return func(*args, kwargs) File "D:\Soft\audio_super_resolution\versatile_audio_super_resolution\audiosr\latent_diffusion\models\ddim.py", line 143, in sample samples, intermediates = self.ddim_sampling( File "D:\Soft\Python\Python38\lib\site-packages\torch\utils_contextlib.py", line 115, in decorate_context return func(*args, *kwargs) File "D:\Soft\audio_super_resolution\versatile_audio_super_resolution\audiosr\latent_diffusion\models\ddim.py", line 237, in ddim_sampling outs = self.p_sample_ddim( File "D:\Soft\Python\Python38\lib\site-packages\torch\utils_contextlib.py", line 115, in decorate_context return func(args, kwargs) File "D:\Soft\audio_super_resolution\versatile_audio_super_resolution\audiosr\latent_diffusion\models\ddim.py", line 293, in p_sample_ddim model_t = self.model.apply_model(x_in, t_in, c) File "D:\Soft\audio_super_resolution\versatile_audio_super_resolution\audiosr\latent_diffusion\models\ddpm.py", line 1030, in apply_model x_recon = self.model(x_noisy, t, cond_dict=cond) File "D:\Soft\Python\Python38\lib\site-packages\torch\nn\modules\module.py", line 1501, in _call_impl return forward_call(*args, *kwargs) File "D:\Soft\audio_super_resolution\versatile_audio_super_resolution\audiosr\latent_diffusion\models\ddpm.py", line 1686, in forward out = self.diffusion_model( File "D:\Soft\Python\Python38\lib\site-packages\torch\nn\modules\module.py", line 1501, in _call_impl return forward_call(args, **kwargs) File "D:\Soft\audio_super_resolution\versatile_audio_super_resolution\audiosr\latent_diffusion\modules\diffusionmodules\openaimodel.py", line 879, in forward h = th.cat([h, concate_tensor], dim=1) RuntimeError: Sizes of tensors must match except in dimension 1. Expected size 64 but got size 63 for tensor number 1 in the list.

Could be related to the length of the audio, just 0.936 seconds, I used Losslesscut to append another mp3 file with same exact configuration without reencoding, and output it with all the same settings - same sample rate, bitrate etc. and then run audiosr with that and got no error.

yuzuda283 commented 8 months ago

same question

Susukerow45 commented 8 months ago

try 0.512 sec wav file

DrBrule commented 8 months ago

Ran into this as well. It seems to be related to the audio file being too short. If you pad the input audio array with some trailing zeros it should function.