Closed rhamnett closed 3 years ago
hey, two quick thoughts on things to try: 1) case information: looks like the vocabulary is all uppercase. is the language model also all upper case or maybe lower case? 2) unigrams are missing: can you explicitly pass all known unigrams into the language model? that way the decoder can build a trie under the hood to efficiently decode partial words
Ahh great suggestions, thanks! Will revert back
closing for now, unless issues are persisting
Thanks your suggestions about case helped, much appreciated. Not tried the second option.
Hi, when using huggingface and FB wav2vec, I'm getting some missing spaces when using various language models which I have created - including a simple LM with just a few phrases. Please can you assist with what is wrong?
result:
No LM: ["OH HELLO IT'S PAKER ID HER I'M BRINGING UP THIS PAPER OL ODAY HELLO MY SISTER'S BEEN BUSY"]
With LM: "OH HELLO IT'S PAKERIDHER I'MSBRINGINGUPTHIS PAPERAALTODAY HELLOMY SISTER'SBEEN BUSY"
No LM:
"WANTED CHIEF JUSTICE OF THE MASSACHUSETTS SUPREME COURT IN APRIL THE S J C 'S CURRENT LEADER EDWARD HENNISE REACHES THE MANDATORY RETIREMENT AGE OF SEVENTY AND THE SUCCESSOR IS EXP"
With LM:
"WANTED CHIEF JUSTICE OF THE MASSACHUSETTS SUPREME COURT IN APRIL THE S C'S CURRENT LEADER "EDWARDHENNISE" REACHES THE MANDATORY RETIREMENT AGE OF SEVENTY AND THE SUCCESSOR IS EXP"
No LM: 'BOIL THEM BEFORE THEYARE PUT INTO THE SOUP OR OTHER DISH THEY MAYBE INTENDED FOR'
With LM: 'BOIL THEM BEFORE THEY ARE PUT INTO THE SOUP OR OTHER DISH THEY MAY BE INTENDED FOR'