haoheliu / versatile_audio_super_resolution

Versatile audio super resolution (any -> 48kHz) with AudioSR.
MIT License
1.15k stars 111 forks source link

Output sample sound bad #4

Open rikabi89 opened 1 year ago

rikabi89 commented 1 year ago

I've uploaded my output using your 8hz sample from : https://audioldm.github.io/audiosr/ speech_AudioSR_Processed_48K.zip

But my sounds terrible compared to yours not sure why?

This is what I've run : python audiosr -i .\versatile_audio_super_resolution\example\speech.wav --model_name speech (tried default as well)

Loading AudioSR: speech
Loading model on cuda:0
H:\anaconda3\envs\audiosr\lib\site-packages\torch\functional.py:504: UserWarning: torch.meshgrid: in an upcoming release, it will be required to pass the indexing argument. (Triggered internally at C:\actions-runner\_work\pytorch\pytorch\builder\windows\pytorch\aten\src\ATen\native\TensorShape.cpp:3484.)
  return _VF.meshgrid(tensors, **kwargs)  # type: ignore[attr-defined]
H:\anaconda3\envs\audiosr\lib\site-packages\torchaudio\transforms\_transforms.py:611: UserWarning: Argument 'onesided' has been deprecated and has no influence on the behavior of this module.
  warnings.warn(
DiffusionWrapper has 258.20 M params.
H:\anaconda3\envs\audiosr\lib\site-packages\audiosr\latent_diffusion\models\ddpm.py:237: RuntimeWarning: divide by zero encountered in divide
  "sqrt_recip_alphas_cumprod", to_torch(np.sqrt(1.0 / alphas_cumprod))
H:\anaconda3\envs\audiosr\lib\site-packages\audiosr\latent_diffusion\models\ddpm.py:240: RuntimeWarning: divide by zero encountered in divide
  "sqrt_recipm1_alphas_cumprod", to_torch(np.sqrt(1.0 / alphas_cumprod - 1))
H:\anaconda3\envs\audiosr\lib\site-packages\audiosr\utils.py:109: FutureWarning: Pass sr=48000, n_fft=2048, n_mels=256, fmin=20, fmax=24000 as keyword args. From version 0.10 passing these as positional arguments will result in an error
  mel = librosa_mel_fn(sampling_rate, filter_length, n_mel, mel_fmin, mel_fmax)
Running DDIM Sampling with 200 timesteps
DDIM Sampler: 100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 200/200 [00:23<00:00,  8.62it/s]
Save audio to ./output\16_09_2023_02_29_50\speech_AudioSR_Processed_48K.wav
haoheliu commented 1 year ago

Hi These are the result I got by running the same input file on:

A100: speech_up_2_AudioSR_Processed_48K.wav.zip

MPS: speech_up_2_AudioSR_Processed_48K.wav.zip

Generally looks fine. Will look into the issue more

haoheliu commented 1 year ago

May be try out different seeds can help? Below is the result when I tried three different seeds.

different_seed.zip

rikabi89 commented 1 year ago

May be try out different seeds can help? Below is the result when I tried three different seeds.

different_seed.zip

There's still cracking here. But your demo is crystal clear. Is there a way to know what seed was outputed?