We can use the 3-gram modules from nemo_toolkit[asr] to avoid the mismatch between words meanings. It can avoid to have incomprehensible words in the transcription of an audio file during ASR steps and improve the understating of rasa NLU.
Steps
Transcripts with log probability mode logits = asr_model.transcribe(files, logprobs=True)[0]
lm_gzip_path = '3-gram.pruned.1e-7.arpa.gz'if not os.path.exists(lm_gzip_path):print('Downloading pruned 3-gram model.')lm_url = 'http://www.openslr.org/resources/11/3-gram.pruned.1e-7.arpa.gz'lm_gzip_path = wget.download(lm_url)print('Downloaded the 3-gram language model.')else:print('Pruned .arpa.gz already exists.')
uppercase_lm_path = '3-gram.pruned.1e-7.arpa'if not os.path.exists(uppercase_lm_path):with gzip.open(lm_gzip_path, 'rb') as f_zipped:with open(uppercase_lm_path, 'wb') as f_unzipped:shutil.copyfileobj(f_zipped, f_unzipped)print('Unzipped the 3-gram language model.')else:print('Unzipped .arpa already exists.')
lm_path = 'lowercase_3-gram.pruned.1e-7.arpa'if not os.path.exists(lm_path):with open(uppercase_lm_path, 'r') as f_upper:with open(lm_path, 'w') as f_lower:for line in f_upper:f_lower.write(line.lower())print('Converted language model file to lowercase.')
Use it for the probabilistic distribution of the likelihood terms on the 3-gram model application beam_search_lm.forward(log_probs = np.expand_dims(probs, axis=0), log_probs_length=None)
Output is an in-order array of tuples which the first element is the confidency of the "correct words" and the second one is the correct sentence.
Improvements Methods
We can use the 3-gram modules from nemo_toolkit[asr] to avoid the mismatch between words meanings. It can avoid to have incomprehensible words in the transcription of an audio file during ASR steps and improve the understating of rasa NLU.
Steps
logits = asr_model.transcribe(files, logprobs=True)[0]
import gzip
import os, shutil, wget
lm_gzip_path = '3-gram.pruned.1e-7.arpa.gz'
if not os.path.exists(lm_gzip_path):
print('Downloading pruned 3-gram model.')
lm_url = 'http://www.openslr.org/resources/11/3-gram.pruned.1e-7.arpa.gz'
lm_gzip_path = wget.download(lm_url)
print('Downloaded the 3-gram language model.')
else:
print('Pruned .arpa.gz already exists.')
uppercase_lm_path = '3-gram.pruned.1e-7.arpa'
if not os.path.exists(uppercase_lm_path):
with gzip.open(lm_gzip_path, 'rb') as f_zipped:
with open(uppercase_lm_path, 'wb') as f_unzipped:
shutil.copyfileobj(f_zipped, f_unzipped)
print('Unzipped the 3-gram language model.')
else:
print('Unzipped .arpa already exists.')
lm_path = 'lowercase_3-gram.pruned.1e-7.arpa'
if not os.path.exists(lm_path):
with open(uppercase_lm_path, 'r') as f_upper:
with open(lm_path, 'w') as f_lower:
for line in f_upper:
f_lower.write(line.lower())
print('Converted language model file to lowercase.')
beam_search_lm = nemo_asr.modules.BeamSearchDecoderWithLM( vocab=list(asr_model.decoder.vocabulary),
beam_width=16,
alpha=2, beta=1.5,
lm_path=lm_path,
num_cpus=max(os.cpu_count(), 1),
input_tensor=False)
beam_search_lm.forward(log_probs = np.expand_dims(probs, axis=0), log_probs_length=None)
Output is an in-order array of tuples which the first element is the confidency of the "correct words" and the second one is the correct sentence.
Source: https://colab.research.google.com/github/NVIDIA/NeMo/blob/main/tutorials/asr/Offline_ASR.ipynb