kensho-technologies / pyctcdecode

A fast and lightweight python-based CTC beam search decoder for speech recognition.
Apache License 2.0
416 stars 89 forks source link

When to move to v1.0.0? #74

Open lopez86 opened 2 years ago

lopez86 commented 2 years ago

I think it might be good to move the version to v1.0.0 soon, but I think it might be good to have an issue open for any discussion. There are several things that I think probably should be done before that happens:

  1. Figure out a more generic saving/loading scheme that can be extended for different language models besides just the provided LanguageModel class. See this issue.
  2. Remove explicit dependence on kenlm in the AbstractLanguageModel and Decoder classes. See this issue
  3. Make sure the documentation and notebooks are fully up to date
  4. (Maybe) Refactor so that the kenlm classes are contained in their own file instead of in the main language model and decoder files. This would break imports since anything mentioning kenlm would now be in a different module.
  5. (Maybe) Add an abstract decoder class to allow for extending with alternate decoder classes? The most basic API would just require a decode() and a decode_batch() function but decode_beams() and decode_beams_batch() might be useful for beam-search decoders
  6. (Maybe) There have been some requests for including per-word scores in the output. Settling on a way to do that might be another good feature improvement to aim for.
yashjogi commented 1 year ago

On Point 6: Just like how time-stamps are being calculated for each word by keeping two variables "frame_list" and "frames", in a similar fashion we can have two more variables "word_confidence_list" and "word_confidence", and we can update them in a way similar to how we update time stamps. However, unlike timestamps, we will have to make changes in _merge_beams function to merge the word confidence scores as well, just like how logit scores are merged.

Is that correct @lopez86 ? I have never contributed to any open source project on GitHub, it'd be great if I can contribute on this word confidence feature.

cc: @patrickvonplaten