Fully trainable pytorch model

JuanFMontesinos commented 2 years ago

Rewrite the model in resemblyzer/voice_encoder.py to be fully dependent on pytorch (thus trainable). I don't understand the reason why your model uses numpy, which breaks backpropagation. This basically rewrites voice_encoder.py so that mel spectrogram is computed with pytorch and can be used in a end-to-end way given a waveform.

Changes:

embed_utterance now expects a pytorch tensor and works in both, batched and non-batched ways.
VoiceEncoder nn.Module has been extended to have a new layer in charge of computing the mel spectrogram.

CorentinJ commented 2 years ago

The code for training this model lies in https://github.com/CorentinJ/Real-Time-Voice-Cloning. This repo is for inference only.

Only the forward function is written in pytorch. The only use for autograd here is for backpropagating forward() for a loss function.

JuanFMontesinos commented 2 years ago

Thanks, though the repo was abandoned. Anyway keeping a backpropagable version can be interesting for other applications.

Regards

resemble-ai / Resemblyzer

Fully trainable pytorch model #67