srvk / eesen

The official repository of the Eesen project
http://arxiv.org/abs/1507.08240
Apache License 2.0
822 stars 343 forks source link

out of vocabulary detection #83

Closed madhavsund closed 7 years ago

madhavsund commented 8 years ago

How to detect Out of Vocabulary words during recognition?

yajiemiao commented 8 years ago

They cannot be detected and added automatically. If you model characters as CTC targets, that could be easier, as you don't need to get the pronunciation for OOV words.

madhavsund commented 8 years ago

Whether it is possible as in cmusphinx http://cmusphinx.sourceforge.net/wiki/sphinx4:rejectionhandling&ei=UrFa_XoW&lc=en-IN&s=1&m=904&host=www.google.co.in&ts=1470582605&sig=AKOVD64WMt4E1w7AxaONOZqWIzhwdttIZA

On 06-Aug-2016 8:43 PM, "Yajie Miao" notifications@github.com wrote:

They cannot be detected and added automatically. If you model characters as CTC targets, that could be easier, as you don't need to get the pronunciation for OOV words.

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/srvk/eesen/issues/83#issuecomment-238028071, or mute the thread https://github.com/notifications/unsubscribe-auth/AOLZf9OIJI8BdrN1p1R4zVWIfP5cqQw5ks5qdKStgaJpZM4JeQl2 .

yajiemiao commented 8 years ago

not really

madhavsund commented 8 years ago

ok.

  1. how to get the confidence score of each recognized word.
  2. how to get the N- best results from lattice
fmetze commented 8 years ago

We added code to the TEDLIUM recipe to do this, see asr_egs/tedlium/v1/local/score_sclite.sh

riebling commented 7 years ago

Closing since no response in 3 months, assuming proposed solution acceptable.