I put the code in the decode.py file, and use a audio with referece label
"但是从尺寸来讲呢往往呢就这个尺寸一有一个城市都是大于一般欧洲的一个国家的".
The decoding result with modified beam search method of the audio is
"但是从尺寸来讲呢往往呢就这个尺寸有的一个城市都是大于一般欧洲的一个国家的".
I decode the audio using fast_beam_search_one_best() with generated label label check fsa. I thought the rnnt model should give a result like
"但是从尺寸来讲呢往往呢就这个尺寸#del有#sis的#eis一个城市都是大于一般欧洲的一个国家的".
But the real result is
"但是从尺寸来讲呢往往呢就这个尺寸#sis有#eis#sis的#eis#sis一#eis#sis个#eis#sis城#eis#sis市#eis#sis都#eis是#sis大#eis#sis于#eis#sis一#eis#sis般#eis#sis欧#eis#sis洲#eis#sis的#eis#sis一#eis#sis个#eis#sis国#eis#sis家#eis#sis的#eis".
Why this would happen?
I'm going to create label check fsa using rnnt model. Here is the schematic diagram of the fsa.
Here is my code to create the decoding fsa using rnnt model.
test code to draw fsa.
the resulting test fsa like this
I put the code in the decode.py file, and use a audio with referece label "但是从尺寸来讲呢往往呢就这个尺寸一有一个城市都是大于一般欧洲的一个国家的". The decoding result with modified beam search method of the audio is "但是从尺寸来讲呢往往呢就这个尺寸有的一个城市都是大于一般欧洲的一个国家的". I decode the audio using fast_beam_search_one_best() with generated label label check fsa. I thought the rnnt model should give a result like "但是从尺寸来讲呢往往呢就这个尺寸#del有#sis的#eis一个城市都是大于一般欧洲的一个国家的". But the real result is "但是从尺寸来讲呢往往呢就这个尺寸#sis有#eis#sis的#eis#sis一#eis#sis个#eis#sis城#eis#sis市#eis#sis都#eis是#sis大#eis#sis于#eis#sis一#eis#sis般#eis#sis欧#eis#sis洲#eis#sis的#eis#sis一#eis#sis个#eis#sis国#eis#sis家#eis#sis的#eis". Why this would happen?