Open cameronfr opened 7 months ago
Thanks for pointing out. This is true. There is actually not a performance difference between this two
Ah thank you and to clarify, the mel input in question was ~128 channels?
How can i optimize the audio cloning process how can i make a change to the def extract_se function?
The paper mentions that
The tone color extractor is a simple 2D convolutional neural network that operates on the mel-spectrogram of the input voice and outputs a single feature vector that encodes the tone color information.
, but inapi.py
I see that it looks like it's operating on the non-mel spectrogram.I'm wondering if this is true, and if so, if there was a reason for using the non-mel spectrogram (was quality better)?