Closed rbracco closed 2 years ago
Thanks for this-- would it be possible to share the logit matrix in a gist so we can take a closer look at this?
Absolutely, please let me know if there's any other way I can help, or if you need it in a different format. Thank you! https://gist.github.com/rbracco/493a7886e4305a0b8021af660ce92884
Thanks, is that the same logit matrix though? I get:
θɹu ʌ sɪɹiz ʌv ɪnfɔɹmʌl ɑʊtɹɪʤ sɛʃʌnz oʊvʌɹ ðʌ nɛkst fju mʌnθs
Oops, so sorry I forgot I had continued playing around with it, the gist has been edited to contain the proper logits.
now getting: ðʌ fɔɹmæt ɪnsɛpʃʌn dɑkjʌmɛnt ðɪs wik fɔɹ sɪgnʌʧʌɹ
I should expect: ðɪs wɪl bi dɪskʌst wɪð ɪndʌstɹi
, right?
Ugh I'm really sorry about that, the gist has been updated for what will hopefully be the final time.
It may take some futzing with the defaults in order to see good performance for any given use case. For example, if I run:
text = decoder.decode(
probabilities,
hotwords=hotwords,
hotword_weight=100,
beam_prune_logp=-100,
token_min_logp=-10
)
I get:
ðɪs wɪd bi dɪskʌsd wɪd ɪndʌstɹi
which is, if not a great decoding, hopefully at least evidence that the hotwords feature is working as intended.
It may be that the chosen defaults for beam_prune_logp
and token_min_logp
should be different when the user submits hotwords, but it's hard to tell from a single example. Ideally the user would perform a hyperparameter search in order to tune the decoder to their use case. I'm not opposed to adding a convenience function to that effect, provided that we can cover most of what people expect out of such a function, something like:
decoder = pyctcdecode.build_and_tune_decoder_from(
train_logit_matrices,
train_transcriptions,
alphabet,
possible_hotwords,
metric='wer',
tuning_iterations=100
)
@gkucsko wdyt?
Thanks, this will at least give me some rabbit holes to go down and see if I can tune a decent decoder myself.
yes, pyctcdecode relies on non-zero logp to propose a next character (token_min_logp
). hotwords will only upweight that suggestion but not propose their own next character. something we could look into adding in the future if there is need for it.
I am trying to test hotword boosting on a model meant to diagnose pronunciation mistakes, so the tokens are in IPA (international phonetic alphabet), but otherwise everything should work the same.
I have two related issues.
ðɪs wɪl bi dɪskʌst wɪð ɪndʌstɹi
(this will be discussed with industry) Hotword used:dɪskʌsd
(changing t for d) Model output after CTCDecode:ðɪs wɪl bi dɪskʌs wɪð ɪndʌstɹi
(the t at the end of 'dɪskʌs' disappears)I didn't think this was possible based on how hotword boosting works? Am I misunderstanding or is this potentially a bug?
Env info
Code