Closed JuanFMontesinos closed 2 years ago
The code for training this model lies in https://github.com/CorentinJ/Real-Time-Voice-Cloning. This repo is for inference only.
Only the forward function is written in pytorch. The only use for autograd here is for backpropagating forward() for a loss function.
Thanks, though the repo was abandoned. Anyway keeping a backpropagable version can be interesting for other applications.
Regards
Rewrite the model in
resemblyzer/voice_encoder.py
to be fully dependent on pytorch (thus trainable). I don't understand the reason why your model uses numpy, which breaks backpropagation. This basically rewritesvoice_encoder.py
so that mel spectrogram is computed with pytorch and can be used in a end-to-end way given a waveform.Changes:
embed_utterance
now expects a pytorch tensor and works in both, batched and non-batched ways.VoiceEncoder
nn.Module has been extended to have a new layer in charge of computing the mel spectrogram.