Provide trained model in higher resolution

pfnet-research / meta-tasnet

A PyTorch implementation of Meta-TasNet from "Meta-learning Extractors for Music Source Separation

MIT License

136 stars 16 forks source link

Provide trained model in higher resolution #4

Open FSharpCSharp opened 4 years ago

FSharpCSharp commented 4 years ago

I have now carried out extensive tests with the model. Unfortunately I found out that the output signal is always cut off at 22050 Hz. Although the actual output signal would have a purely theoretical resolution of 32000 Hz. This means that the signal does not have the full range that it could actually have.

Is this due to the learned model, or any additional settings? I have now proceeded as described in the Python notebook, and double-checked everything. Unfortunately the quality is not as brilliant as it could be due to the 22050 Hz output result. Here is a short explanation.

davda54 commented 4 years ago

Hi, you're right, there's a clear cut-off after 10 kHz, but we're unsure about its cause. It seems to be an internal property of the neural network. Please let me know if you catch the bug :)

spectrogram_cutoff

RadioAngurem commented 4 years ago

I have downloaded the MusDB tracks and I have checked that the sum of the stems is not equal to the stereo mix. Not only that, the difference between the stereomix and the stems sum it´s to big to not consider it. You can identify the song listening that difference so I believe that compute the weights of the TCN mask using the MusDB instead of the MusDB HQ add an error to the model.

Also, why not try to add another step to the network?. The MusDB cut the frequencies above 16Khz so the model is not trained to work with audios that have information above that frequency. Could a network whit this parameters work?:

S = 10; 1, T/2, sr=6000 Hz S = 20; 1,T, sr=12000 Hz S = 40; 1,2T, sr=24000 Hz S = 80; 1,4T, sr=48000 Hz

coincoin73 commented 4 years ago

I have downloaded the MusDB tracks and I have checked that the sum of the stems is not equal to the stereo mix. Not only that, the difference between the stereomix and the stems sum it´s to big to not consider it. You can identify the song listening that difference so I believe that compute the weights of the TCN mask using the MusDB instead of the MusDB HQ add an error to the model.

Also, why not try to add another step to the network?. The MusDB cut the frequencies above 16Khz so the model is not trained to work with audios that have information above that frequency. Could a network whit this parameters work?:

S = 10; 1, T/2, sr=6000 Hz S = 20; 1,T, sr=12000 Hz S = 40; 1,2T, sr=24000 Hz S = 80; 1,4T, sr=48000 Hz

Did someone, made the test ?

JeffreyCA commented 3 years ago

There was a similar issue to this with Spleeter, where high frequencies are not present in output files. Here's their explanation: https://github.com/deezer/spleeter/wiki/5.-FAQ#why-are-there-no-high-frequencies-in-the-generated-output-files-

@davda54 Could this issue be similar to that?