artifacts at the verge between existing and extended frequency bands

slp-rl / aero

This repo contains the official PyTorch implementation of "Audio Super Resolution in the Spectral Domain" (ICASSP 2023)

MIT License

190 stars 24 forks source link

artifacts at the verge between existing and extended frequency bands #23

Open Cunfu-Zhuge opened 6 months ago

Cunfu-Zhuge commented 6 months ago

I trained the model by myself and found there was a line of artifacts at at the verge between existing and extended frequency bands using the model I trained. I read the paper related to this repo and found the three reasons provided in the paper for this phenomenon didn't exist in my training process. I don't know why. I guess maybe it's because of the process i transformed .flac to .wav for the dataset VCTK. So could you please tell me how did you transform .flac to .wav? Thank you!

Cunfu-Zhuge commented 6 months ago

I find there is a parameter of "model" in the official pretrained model is "hdemucs-snake-ftb-lstm-peg-concat". But this value is not supported by the code. The supported values for this parameter are just "Aero" and "SEAnet".

At0nale commented 5 months ago

I had the same issue although I am not converting .flac to .wav so I would rule this out.. I have partially fixed the issue by implementing variance in the way I downsample my training examples (Now using 10 different downsampling methods).

it's working nicely for clean voice samples but as soon as there is a bit of noise or distortion in the voice, I have a frequency boost at the verge between existing and extended frequency bands again.

Any update on this issue?

yezhangyinge commented 1 month ago

I trained the model by myself and found there was a line of artifacts at at the verge between existing and extended frequency bands using the model I trained. I read the paper related to this repo and found the three reasons provided in the paper for this phenomenon didn't exist in my training process. I don't know why. I guess maybe it's because of the process i transformed .flac to .wav for the dataset VCTK. So could you please tell me how did you transform .flac to .wav? Thank you!

I also found that "a line of artifacts at at the verge between existing and extended frequency bands using the model I trained". Did you tackle this problem? Or any suggestions?

yezhangyinge commented 1 month ago

I had the same issue although I am not converting .flac to .wav so I would rule this out.. I have partially fixed the issue by implementing variance in the way I downsample my training examples (Now using 10 different downsampling methods).

it's working nicely for clean voice samples but as soon as there is a bit of noise or distortion in the voice, I have a frequency boost at the verge between existing and extended frequency bands again.

Any update on this issue?

Gould you tell me how to "implementing variance in the way I downsample my training examples (Now using 10 different downsampling methods)" ? I also met this problem.