kensho-technologies / pyctcdecode

A fast and lightweight python-based CTC beam search decoder for speech recognition.
Apache License 2.0
416 stars 89 forks source link

fix conversion of ## type bpe format to _ type #23

Closed shashikg closed 2 years ago

shashikg commented 2 years ago

Hey,

There was a minor bug in the conversion of ## type bpe format to type. Basically, that converts the '\<unk>' to '\<unk>', which should not be done.

Colab notebook demonstrating the issue: https://colab.research.google.com/drive/1Go2_7_ugRx5g0QuyJ4EeCOCdKBtjRLwk?usp=sharing

Colab notebook demonstrating the fix: https://colab.research.google.com/drive/1oOi6juTD_kBM9Z3h9Uw1C0tcQC_Ga5K5?usp=sharing