slp-rl / aero

This repo contains the official PyTorch implementation of "Audio Super Resolution in the Spectral Domain" (ICASSP 2023)
MIT License
190 stars 24 forks source link

RuntimeError: Given groups=1, weight of size [48, 2, 1, 1], expected input[1, 4, 256, 7501] to have 2 channels, but got 4 channels instead #3

Closed FurkanGozukara closed 1 year ago

FurkanGozukara commented 1 year ago

I am trying to run 12-48 / aero-nfft=512-hl=256 Although I have no idea what is 512 and 256

installed all requirements

run my command like this and got error

python predict.py dset=4-16 experiment=aero_4-16_512_256 +filename="D:\86 se courses youtube kanali\aero\5dk.mp3" +output="D:\86 se courses youtube kanali\aero\5_v2dk.mp3" checkpoint_file="D:\86 se courses youtube kanali\aero\checkpoint.th"

I want to improve quality of this audio 5 min sound : https://sndup.net/stjs/

(env) D:\86 se courses youtube kanali\aero>python predict.py dset=4-16 experiment=aero_4-16_512_256 +filename="D:\86 se courses youtube kanali\aero\5dk.mp3" +output="D:\86 se courses youtube kanali\aero\5_v2dk.mp3" checkpoint_file="D:\86 se courses youtube kanali\aero\checkpoint.th"
D:\86 se courses youtube kanali\aero\env\lib\site-packages\hydra\_internal\defaults_list.py:251: UserWarning: In 'main_config': Defaults list is missing `_self_`. See https://hydra.cc/docs/upgrades/1.0_to_1.1/default_composition_order for more information
  warnings.warn(msg, UserWarning)
{'experiment': {'name': 'aero-nfft=${experiment.nfft}-hl=${experiment.hop_length}', 'lr_sr': 4000, 'hr_sr': 16000, 'segment': 2, 'stride': 2, 'pad': True, 'upsample': False, 'batch_size': 16, 'nfft': 512, 'hop_length': 256, 'model': 'aero', 'aero': {'in_channels': 1, 'out_channels': 1, 'channels': 48, 'growth': 2, 'nfft': '${experiment.nfft}', 'hop_length': '${experiment.hop_length}', 'end_iters': 0, 'cac': True, 'rewrite': True, 'hybrid': False, 'hybrid_old': False, 'freq_emb': 0.2, 'emb_scale': 10, 'emb_smooth': True, 'kernel_size': 8, 'strides': [4, 4, 2, 2], 'context': 1, 'context_enc': 0, 'freq_ends': 4, 'enc_freq_attn': 0, 'norm_starts': 2, 'norm_groups': 4, 'dconv_mode': 1, 'dconv_depth': 2, 'dconv_comp': 4, 'dconv_time_attn': 2, 'dconv_lstm': 2, 'dconv_init': 0.001, 'rescale': 0.1, 'lr_sr': '${experiment.lr_sr}', 'hr_sr': '${experiment.hr_sr}', 'spec_upsample': True, 'act_func': 'snake', 'debug': False}, 'adversarial': True, 'features_loss_lambda': 100, 'only_features_loss': False, 'only_adversarial_loss': False, 'discriminator_models': ['msd_melgan'], 'melgan_discriminator': {'n_layers': 4, 'num_D': 3, 'downsampling_factor': 4, 'ndf': 16}}, 'dset': {'name': '4-16', 'train': 'egs/vctk/4-16/tr', 'valid': None, 'test': 'egs/vctk/4-16/val'}, 'num_prints': 5, 'device': 'cuda', 'num_workers': 2, 'verbose': 0, 'show': 0, 'log_results': True, 'checkpoint': True, 'continue_from': '', 'continue_best': False, 'restart': False, 'checkpoint_file': 'D:\\86 se courses youtube kanali\\aero\\checkpoint.th', 'best_file': 'best.th', 'history_file': 'history.json', 'test_results_file': 'test_results.json', 'samples_dir': 'samples', 'keep_history': True, 'seed': 2036, 'dummy': '', 'visqol': True, 'visqol_path': None, 'eval_every': 25, 'enhance_samples_limit': -1, 'valid_equals_test': None, 'cross_valid': False, 'cross_valid_every': 5, 'joint_evaluate_and_enhance': True, 'evaluate_on_best': False, 'wandb': {'project_name': 'Spectral Bandwidth Extension', 'entity': None, 'mode': 'online', 'log': 'all', 'log_freq': 5, 'n_files_to_log': 10, 'n_files_to_log_to_table': 10, 'tags': [], 'resume': False}, 'optim': 'adam', 'lr': 0.0003, 'beta1': 0.8, 'beta2': 0.999, 'losses': ['stft'], 'stft_sc_factor': 0.5, 'stft_mag_factor': 0.5, 'epochs': 125, 'ddp': False, 'ddp_backend': 'nccl', 'rendezvous_file': './rendezvous', 'rank': None, 'world_size': None, 'filename': 'D:\\86 se courses youtube kanali\\aero\\5dk.mp3', 'output': 'D:\\86 se courses youtube kanali\\aero\\5_v2dk.mp3'}
[2023-02-09 14:36:36,703][__main__][INFO] - Loading model aero from last state.
[2023-02-09 14:36:38,679][__main__][INFO] - lr wav shape: torch.Size([2, 14400000])
[2023-02-09 14:36:38,680][__main__][INFO] - number of chunks: 30
Error executing job with overrides: ['dset=4-16', 'experiment=aero_4-16_512_256', '+filename=D:\\86 se courses youtube kanali\\aero\\5dk.mp3', '+output=D:\\86 se courses youtube kanali\\aero\\5_v2dk.mp3', 'checkpoint_file=D:\\86 se courses youtube kanali\\aero\\checkpoint.th']
Traceback (most recent call last):
  File "D:\86 se courses youtube kanali\aero\predict.py", line 77, in main
    pr_chunk = model(lr_chunk.unsqueeze(0).to(device)).squeeze(0)
  File "D:\86 se courses youtube kanali\aero\env\lib\site-packages\torch\nn\modules\module.py", line 1130, in _call_impl
    return forward_call(*input, **kwargs)
  File "D:\86 se courses youtube kanali\aero\src\models\aero.py", line 472, in forward
    x = encode(x, inject)
  File "D:\86 se courses youtube kanali\aero\env\lib\site-packages\torch\nn\modules\module.py", line 1130, in _call_impl
    return forward_call(*input, **kwargs)
  File "D:\86 se courses youtube kanali\aero\src\models\aero.py", line 120, in forward
    x = self.pre_conv(x)
  File "D:\86 se courses youtube kanali\aero\env\lib\site-packages\torch\nn\modules\module.py", line 1130, in _call_impl
    return forward_call(*input, **kwargs)
  File "D:\86 se courses youtube kanali\aero\env\lib\site-packages\torch\nn\modules\conv.py", line 457, in forward
    return self._conv_forward(input, self.weight, self.bias)
  File "D:\86 se courses youtube kanali\aero\env\lib\site-packages\torch\nn\modules\conv.py", line 453, in _conv_forward
    return F.conv2d(input, weight, bias, self.stride,
RuntimeError: Given groups=1, weight of size [48, 2, 1, 1], expected input[1, 4, 256, 7501] to have 2 channels, but got 4 channels instead

Set the environment variable HYDRA_FULL_ERROR=1 for a complete stack trace.
m-mandel commented 1 year ago

Your audio should be mono, is it mono (single channel) or stereo (2 channels)? If it is not mono, you should convert it to mono (you can use any tool e.g. sox, ffmpeg).