kensho-technologies / pyctcdecode

A fast and lightweight python-based CTC beam search decoder for speech recognition.
Apache License 2.0
416 stars 89 forks source link

[Decoder Dir Parsing] Allow decoder folder to contain additional files #37

Closed patrickvonplaten closed 2 years ago

patrickvonplaten commented 2 years ago

This PR slightly relaxed the structure that the decoder is allowed to have.

Before this PR: The decoder directory essentially can only consist of the language_model dir and the alphabet.json file.

This PR: The decoder directory has to have a language_model dir and a alphabet.json file, but can also have additional files on the top level.

I think this goes hand-in-hand with https://github.com/kensho-technologies/pyctcdecode/pull/32#discussion_r758314862 to allow more files in the directory. It would greatly facilitate combining speech model files and decoder files in a "not-too-nested" way (see comment below).