manojpamk / pytorch_xvectors

Deep speaker embeddings in PyTorch, including x-vectors. Code used in this work: https://arxiv.org/abs/2007.16196
MIT License
304 stars 65 forks source link

Provide example for inference in Python #7

Closed diego-fustes closed 3 years ago

diego-fustes commented 4 years ago

Hi and many thanks for this nice work. I'm trying to integrate this code into my project in Python to obtain embeddings from a given WAV file. From the source files I can easily get how you apply the network and get the embeddings. However, the nnet3 egs format that it's being read needs to be computed by kaldi... is there an option to preprocess the file with a pure python library? Could you document the exact shape of the MFCCs that the models expects? That way I may implement the feature extraction with librosa or another similar tool

Thank you in advance

manojpamk commented 4 years ago

Hi,

Unfortunately I haven't been able to work on a purely Pythonic audio -> egs pipeline. Do you need to train the network? If you only need the embeddings, nnet3 egs format is not required. Check out egs/diarize.sh for an example.

Manoj