xinjli / allosaurus

Allosaurus is a pretrained universal phone recognizer for more than 2000 languages
GNU General Public License v3.0
550 stars 86 forks source link

added the return_lstm and return_both options in the recognizer #36

Closed raotnameh closed 3 years ago

raotnameh commented 3 years ago

Hi @xinjli ,

I have added two arguments in /allosaurus/allosaurus/am/allosaurus_torch.py forward() function.

:return_lstm: [list containing the output_embeddings and their respective lengths]
:return_both: tuple containing (a list containing the output_embeddings and their respective lengths and the ouptut of phone layer)

This would facilitate a user to extract the embeddings for any downstream task. For example intent classification.

How to use it given a list of paths to a wav_file?

import torch
import numpy as np
from torch.nn.utils.rnn import pad_sequence

from allosaurus.audio import read_audio
from allosaurus.app import read_recognizer
from allosaurus.am.utils import *

recognizer = read_recognizer()
wav_paths = ['/home/hemant/cmu/fluent_speech_commands_dataset/wavs/speakers/k5bqyxx2lzIbrlg9/16f1a930-452a-11e9-a843-8db76f4b5e29.wav',
 '/home/hemant/cmu/fluent_speech_commands_dataset/wavs/speakers/NgQEvO2x7Vh3xy2xz/5a9e2580-45bd-11e9-8ec0-7bf21d1cfe30.wav']

feats, feat_lens = [], []
for wav_path in wav_paths:

    feat = torch.tensor(recognizer.pm.compute(read_audio(wav_path))) # batch, len, features
    feat_len = torch.tensor(np.array([feat.shape[0]], dtype=np.int32)) # 1D array

    feats.append(feat)
    feat_lens.append(feat_len)

feats = pad_sequence(feats,batch_first=True,padding_value=0) # batch,features,len
feat_lens = pad_sequence(feat_lens,batch_first=True,padding_value=0).squeeze()
idx = torch.argsort(feat_lens,descending=True) # sorting the input in descending order as required by the lstms in AM.
tensor_batch_feat, tensor_batch_feat_len = move_to_tensor([feats[idx], feat_lens[idx]], recognizer.config.device_id) # converting to the required tensors

# Features
output_tensor, input_lengths = recognizer.am(tensor_batch_feat, tensor_batch_feat_len, return_lstm=True) # output_shape: [len,batch,features]

LMK your thoughts on this. Thanks for open-sourcing the work. Really appreciate it.

xinjli commented 3 years ago

Hi Hemant,

Thanks for your pull request! It looks good to me!

Xinjian