Open Cunfu-Zhuge opened 6 months ago
I find there is a parameter of "model" in the official pretrained model is "hdemucs-snake-ftb-lstm-peg-concat". But this value is not supported by the code. The supported values for this parameter are just "Aero" and "SEAnet".
I had the same issue although I am not converting .flac to .wav so I would rule this out.. I have partially fixed the issue by implementing variance in the way I downsample my training examples (Now using 10 different downsampling methods).
it's working nicely for clean voice samples but as soon as there is a bit of noise or distortion in the voice, I have a frequency boost at the verge between existing and extended frequency bands again.
Any update on this issue?
I trained the model by myself and found there was a line of artifacts at at the verge between existing and extended frequency bands using the model I trained. I read the paper related to this repo and found the three reasons provided in the paper for this phenomenon didn't exist in my training process. I don't know why. I guess maybe it's because of the process i transformed .flac to .wav for the dataset VCTK. So could you please tell me how did you transform .flac to .wav? Thank you!
I also found that "a line of artifacts at at the verge between existing and extended frequency bands using the model I trained". Did you tackle this problem? Or any suggestions?
I had the same issue although I am not converting .flac to .wav so I would rule this out.. I have partially fixed the issue by implementing variance in the way I downsample my training examples (Now using 10 different downsampling methods).
it's working nicely for clean voice samples but as soon as there is a bit of noise or distortion in the voice, I have a frequency boost at the verge between existing and extended frequency bands again.
Any update on this issue?
Gould you tell me how to "implementing variance in the way I downsample my training examples (Now using 10 different downsampling methods)" ? I also met this problem.
I trained the model by myself and found there was a line of artifacts at at the verge between existing and extended frequency bands using the model I trained. I read the paper related to this repo and found the three reasons provided in the paper for this phenomenon didn't exist in my training process. I don't know why. I guess maybe it's because of the process i transformed .flac to .wav for the dataset VCTK. So could you please tell me how did you transform .flac to .wav? Thank you!