BradNeuberg commented 6 years ago

Mozilla DeepSpeech will sometimes create long runs of text with no spaces:

omiokaarforfthelastquarterwastoget

This happens even with short audio clips (4 seconds) with a native American english speaker recorded using a high quality microphone in Mac OS X laptops. I've isolated the problem to interaction with the language model rather than the acoustic model or length of audio clips, as the problem goes away when the language model is turned off.

The problem might be related to encountering out-of-vocabulary terms.

I’ve put together test files with results that show the issue is related to the language model somehow rather than the length of the audio or the acoustic model.

I’ve provided 10 chunked WAV files at 16khz 16 bit depth, each 4 seconds long, that are a subset of a fuller 15 minute audio file (I have not provided that full 15 minute file, as a few shorter reproducible chunks are sufficient to reproduce the problem):

https://www.dropbox.com/sh/3qy65r6wo8ldtvi/AAAAVinsD_kcCi8Bs6l3zOWFa?dl=0

The audio segments deliberately include occasional out-of-vocabulary terms, mostly technical, such as “OKR”, “EdgeStore”, “CAPE”, etc.

Also in that folder are several text files that show the output with the standard language model being used, showing the garbled words together (chunks_with_language_model.txt):

Running inference for chunk 1
so were trying again a maybeialstart this time

Running inference for chunk 2
omiokaarforfthelastquarterwastoget

Running inference for chunk 3
to car to state deloedmarchinstrumnalha

Running inference for chunk 4
a tonproductcaseregaugesomd produce sidnelfromthat

Running inference for chunk 5
i am a to do that you know 

Running inference for chunk 6
we finish the kepehandlerrwend finished backfileprocessing 

Running inference for chunk 7
and is he teckdatthatwewould need to do to split the cape 

Running inference for chunk 8
out from sir handler and i are on new 

Running inference for chunk 9
he is not monolithic am andthanducotingswrat 

Running inference for chunk 10
relizationutenpling paws on that until it its a product signal

Then, I’ve provided similar output with the language model turned off (chunks_without_language_model.txt):

Running inference for chunk 1
so we're tryng again ah maybe alstart this time

Running inference for chunk 2
omiokaar forf the last quarter was to get

Running inference for chunk 3
oto car to state deloed march in strumn alha

Running inference for chunk 4
um ton product  caser egauges somd produc sidnel from that

Running inference for chunk 5
am ah to do that ou nowith

Running inference for chunk 6
we finishd the kepe handlerr wend finished backfile processinga

Running inference for chunk 7
on es eteckdat that we would need to do to split the kae ha

Running inference for chunk 8
rout frome sir hanler and ik ar on newh

Running inference for chunk 9
ch las not monoliic am andthan ducotings wrat 

Running inference for chunk 10
relization u en pling a pas on that until it its a product signal

I’ve included both these files in the shared Dropbox folder link above.

Here’s what the correct transcript should be, manually done (chunks_correct_manual_transcription.txt):

So, we're trying again, maybe I'll start this time.

So my OKR for the last quarter was to get AutoOCR to a state that we could
launch an external alpha, and product could sort of gauge some product signal
from that. To do that we finished the CAPE handler, we finished backfill 
processing, we have some tech debt that we would need to do to split the CAPE 
handler out from the search handler and make our own new handler so its not
monolithic, and do some things around CAPE utilization. We are kind of putting
a pause on that until we get some product signal.

This shows the language model is the source of this problem; I’ve seen anecdotal reports from the official message base and blog posts that this is a wide spread problem. Perhaps when the language model hits an unknown n-gram, it ends up combining all of them together rather than retaining the space between them.

Discussion around this bug started on the standard DeepSpeech discussion forum: https://discourse.mozilla.org/t/text-produced-has-long-strings-of-words-with-no-spaces/24089/13 https://discourse.mozilla.org/t/longer-audio-files-with-deep-speech/22784/3

Have I written custom code (as opposed to running examples on an unmodified clone of the repository):

The standard client.py was slightly modified to segment the longer 15 minute audio clip into 4 second blocks.

OS Platform and Distribution (e.g., Linux Ubuntu 16.04):

Mac OS X 10.12.6 (16G1036)

TensorFlow installed from (our builds, or upstream TensorFlow):

Both Mozilla DeepSpeech and TensorFlow were installed into a virtualenv setup via the following requirements.txt file:

tensorflow==1.4.0
deepspeech==0.1.0
numpy==1.13.3
scipy==0.19.1
webrtcvad==2.0.10

TensorFlow version (use command below):

('v1.4.0-rc1-11-g130a514', '1.4.0')

Python version:

Python 2.7.13

Bazel version (if compiling from source):

Did not compile from source.

GCC/Compiler version (if compiling from source):

Same

CUDA/cuDNN version:

Used CPU only version

GPU model and memory:

Used CPU only version

Exact command to reproduce:

I haven't provided my full modified client.py that segments longer audio, but to run with a language model using the standard deepspeech command against a known 4 seconds audio clip included in the Dropbox folder shared above you can run the following:

# Set $DEEPSPEECH to where full Deep Speech checkout is; note that my own git checkout
# for the `deepspeech` runner is at git sha fef25e9ea6b0b6d96dceb610f96a40f2757e05e4
deepspeech $DEEPSPEECH/models/output_graph.pb chunk_2_length_4.0_s.wav $DEEPSPEECH/models/alphabet.txt $DEEPSPEECH/models/lm.binary $DEEPSPEECH/models/trie

# Similar command to run without language model -- spaces retained for unknown words:
deepspeech $DEEPSPEECH/models/output_graph.pb chunk_2_length_4.0_s.wav $DEEPSPEECH/models/alphabet.txt

This is clearly a bug and not a feature :)

BradNeuberg commented 6 years ago

@kdavis-mozilla @lissyx

reuben commented 6 years ago

Thank you for filling out the issue template :)

This misbehavior seems to happen only when the acoustic model is not doing a very good job. I agree the decoder should not degrade to that level. I haven't had the chance to debug this issue other than tweaking the decoder hyperparameters to try to alleviate the problem. I'll take a closer look.

reuben commented 6 years ago

Yes. In some weird cases when the acoustic model is not performing well the decoder falls into this weird state of gluing together words. I'm hoping it can be fixed by tweaking the beam search implementation.

BradNeuberg commented 6 years ago

What assumptions does the acoustic model make (i.e. whats the distribution and characteristics of the audio training data?) The audio I provided sounds pretty clear IMHO, but perhaps the audio training data doesn't have enough diversity to help the deep net generalize (i.e. the deep net is essentially overfitting to the training data and isn't generalizing well).

jessetrana commented 6 years ago

@reuben Any further progress on this by any chance?

learnerAI commented 6 years ago

Facing the same issue. Any progress or way out to improve the performance?

rjzevallos commented 6 years ago

How can I train my model without the language model?

kdavis-mozilla commented 6 years ago

@nyro22 Training never involves the language model. Computing WER's, however does.

dsouza95 commented 6 years ago

I am facing the same issue on a rather similar configuration to the one described above. Was there any progress on this? Thanks!

bolt163 commented 6 years ago

facing the same problem..... /data/home/DeepSpeech# /data/home/DeepSpeech/deepspeech phoneme_output_graph.pb phoneme.txt A2_1.wav TensorFlow: v1.6.0-11-g7554dd8 DeepSpeech: v0.1.1-48-g31c01db Warning: reading entire model file into memory. Transform model file into an mmapped graph to reduce heap usage. 2018-05-03 10:27:27.750965: I tensorflow/core/platform/cpu_feature_guard.cc:140] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2 FMA 2018-05-03 10:27:28.111299: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1212] Found device 0 with properties: name: Tesla M40 24GB major: 5 minor: 2 memoryClockRate(GHz): 1.112 pciBusID: 0000:02:00.0 totalMemory: 23.90GiB freeMemory: 22.71GiB 2018-05-03 10:27:28.111338: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1312] Adding visible gpu devices: 0 2018-05-03 10:27:28.318726: I tensorflow/core/common_runtime/gpu/gpu_device.cc:993] Creating TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 22058 MB memory) -> physical GPU (device: 0, name: Tesla M40 24GB, pci bus id: 0000:02:00.0, compute capability: 5.2)

ta1jin3ping2iiiao1bu4de5li4liang4zai4iiiong3dao4shang4xia4fan1teng2iiiong3dong4she2xing2zhuang4ru2hai3tun2iii4zhix2iii3iii4tou2de5iiiu1shix4ling3xian1

elpimous commented 6 years ago

Hi @bolt163, It seems that your problem is anywere else... Could you provide wanted transcript of this wav ?!

nabrowning commented 6 years ago

having the same issues. if I modify the beam search algorithm myself, what would be the steps to recompile using the updated beam search?

kdavis-mozilla commented 6 years ago

The build requirements are here[1] and the build instructions are here[2].

mihiraniljoshi commented 6 years ago

@reuben I am also facing the same issue ? Any suggestions ?

xiapoy commented 6 years ago

Same here. Also, seems to be happening when using out-of-vocabulary terms.

GeorgeFedoseev commented 6 years ago

Probably the bug is somewhere in this function:

https://github.com/mozilla/DeepSpeech/blob/e34c52fcb98854c5ecc5639a8ace6196f5825fbd/native_client/beam_search.h#L56-L92

Seems that the problem is that sequences with out-of-vocabulary words receive more score without spaces than with spaces.

bernardohenz commented 6 years ago

Does the beam search use Length Normalization? According to Andrew Ng, it improves the beam search by reducing the penalty for outputting sentences with higher number of words. Andrew talks about it in this video.

EDIT: I just realized that the 'word_countweight' is performing this

bernardohenz commented 6 years ago

@GeorgeFedoseev I've been trying to debug this part of code you have pointed, and I noticed some weird behavior. Next, I've printed the score for, respectively: (1) a common word of my corpus; (2) a rare word of the corpus; (3) an invalid/out-of-vocabulary word; (4) the score of the variable 'oovscore'. I am printing the scores on some different states:

'não':    -2.9632
'informação':    -4.43594
'fdfdg':    -5.15036
oov_score:    -4.70759

------------------------------

'não':    -2.97739
'informação':    -4.45013
'fdfdg':    -5.16455
oov_score:    -4.70759

------------------------------

'não':    -2.88466
'informação':    -5.84782
'fdfdg':    -6.56224
oov_score:    -4.70759

Notice that the oov_score is not the same as the invalid word, and in some cases it is even higher than a valid word. I tried to add the following lines to the code:

Model::State out;
oov_score_ = model_.FullScore(from_state.model_state, model_.GetVocabulary().NotFound(), out).prob;

and now it appears that the score of the invalid word and the variable are similar. When testing on my examples, it is not enough to solve this problem, but it certainly reduced the 'gluing together words'.

PS: words are from my pt-br language corpus

GeorgeFedoseev commented 6 years ago

@bernardohenz I think you printed scores for (1), (2) and (3) that depend on state, but oov_score_ does not depend on state (in master), and you cannot compare them.

If you print (3) with model.NullContextState() wouldn't it be the same as oov_score_?

bernardohenz commented 6 years ago

@GeorgeFedoseev yes, it is true. But why wouldn't oov_score depend on the state? I think it makes sense to compute the oov_score for each state. What do you think?

GeorgeFedoseev commented 6 years ago

@bernardohenz as I understand code: when the construction of word is not finished yet (if (!alphabet_.IsSpace(to_label)) part), to tell beam search that its going in right direction, we are adding minimum unigram score of the word that this search can lead to. And this minimum unigram score is precomputed without state (with model.NullContextState()) and saved in trie file. To get this score dynamically depending on state you will need for each prefix find all possible words that it can lead to and select minimum score (which is probably very slow).

So oov_score_ doesnt depend on state cause we are comparing OOV braches of beam search with in-vocabulary branches, which are scored using scores from trie file (and that scores don't depend on state).

bernardohenz commented 6 years ago

But the problem of assigning such minimum unigram score (or oov_score) is that, during beam search, the algorithm is preferring to concatenate lots of characters, rather than choosing an space and finish a low-probability word (such my (2) example).

One idea that occurred to me is to penalize longer words, so to avoid cases where the algorithm tries to concatenate more than 3 words together without a space.

titardrew commented 6 years ago

Replacing

oov_score_ = model_.FullScore(model_.NullContextState(), model_.GetVocabulary().NotFound(), out).prob;

with

oov_score_ = -1000.00;

helped. Did I just raise another error?

bernardohenz commented 6 years ago

In fact I created another variable (oov_score_2) to compute this value (oov_score_ can't be modified inside the function).

And, I do not know if it is a good idea to set oov_score_ = -1000.00;, since this is used when you are composing the word (char by char). The point of 'correcting' the oov_score_ is to avoid the alg to just decide to gluing all characters together (without a space char).

GeorgeFedoseev commented 6 years ago

@bernardohenz I think that in that part (if (!alphabet_.IsSpace(to_label))) it should be just that oov word gets score lower than any vocabulary word.

Try to increase word_count_weight_ from default 1 to something like 3.5. This resulted in less concatenation for me and decreased my WER by 3-4%.

reuben commented 6 years ago

I've implemented length normalization (word_count_weight was only a gross approximation) as well as switch to a fixed OOV score (which was in a TODO list for a long time) as part of the streaming changes, which will be merged soon for our next release. When we have binaries available for testing I'll comment here so anyone interested can test if it improves the decoder behavior on the cases described here. Thanks a lot for the investigation and suggestions, @bernardohenz, @GeorgeFedoseev and @titardrew!

HackInvent commented 6 years ago

@BradNeuberg @reuben is this issue closed? Am running the 0.2.0va7 (with ldc93s1 and a new wav file) version of Deepspeech, and the result (like hiieieddiitwenty) doesn't match with the language model. If there is any tweek to force respecting the model, am buying it even if it is time consuming.

diego-fustes commented 6 years ago

Hi @reuben, I am also seeing this problem in the master branch. Could you provide a patch with your implementation to deal with it?

ZipingL commented 6 years ago

Hi @reuben when will you have the binaries available?

elpimous commented 6 years ago

+1😉

reuben commented 6 years ago

They will be available with our next release, v0.2, when it is ready :)

desaur commented 6 years ago

Hi @reuben Any update on these binaries? I too would like to test their impact on decoder behavior.

reuben commented 6 years ago

We're currently training a model for the v0.2 release. Send me an email at {my github username} at mozilla.com and I'll give you access to a preliminary trained model so you can test the code changes.

If you have your own model and just want the binaries, they're available here: https://tools.taskcluster.net/groups/ClsXrFSbTJ6uUkEAPqFG8A

The Python and Node packages are also available, just specify version 0.2.0-alpha.9

b-ak commented 6 years ago

@reuben Dropped a mail to you !!

f90 commented 6 years ago

I need the new decoder library so binary for Linux x64, how can I download it from the URL given by @reuben ? I am a bit lost on that webpage.

When I click on DeepSpeech Linux AMD64 CPU and then on artifacts and then download the public/native_client.tar.xz I dont see any changes in my decoded output when using this .so library compared to the current one, there is still only one or two words followed by a very very long one without spaces... despite ensuring that my model frequently outputs white spaces, and the beam and greedy decoding output looks fine

zhao-xin commented 6 years ago

just tested 0.2.0 release (deepspeech and models), still get long words out of English vocabulary,

This example is a phone call recording (one channel out of two), TTS works well for the first sentence (a pre-recorded welcome message). Then it is a part of real conversation. TTS doesn't work properly.

The command and outputs are

(deepspeech-venv) jonathan@ubuntu:~$ deepspeech --model ~/deepspeech-0.2.0-models/models/output_graph.pb --audio ~/audio/C2AICXLGB3D2SMK4WPZF26KEZTRUA6OYR1.wav --alphabet ~/deepspeech-0.2.0-models/models/alphabet.txt --lm ~/deepspeech-0.2.0-models/models/lm.binary --trie ~/deepspeech-0.2.0-models/models/trie Loading model from file /home/jonathan/deepspeech-0.2.0-models/models/output_graph.pb TensorFlow: v1.6.0-18-g5021473 DeepSpeech: v0.2.0-0-g009f9b6 Warning: reading entire model file into memory. Transform model file into an mmapped graph to reduce heap usage. 2018-09-20 11:02:49.456955: I tensorflow/core/platform/cpu_feature_guard.cc:140] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2 FMA Loaded model in 0.134s. Loading language model from files /home/jonathan/deepspeech-0.2.0-models/models/lm.binary /home/jonathan/deepspeech-0.2.0-models/models/trie Loaded language model in 3.85s. Running inference. thank you for calling national storage your call may be recorded for coaching and quality the poses place let us not an if ye prefer we didn't record your colt to day in wall constrashionalshordistigwisjemaigay am so it just so he put in your code held everything in a disbosmygriparsesnwygorighticame so she's not like um that's all good if you won't care if i can just reserve something from my end over the foreign am i can reserve at the same on mine price you will looking out as well um which sent a and a unit is he looking out without which location for it an put it by sereerkapcoolofmijustrynorfrommians or a man we after the ground floor on the upper of a at Inference took 33.947s for 58.674s audio file.

The audio can be found from https://s3.us-east-2.amazonaws.com/fonedynamicsuseast2/C2AICXLGB3D2SMK4WPZF26KEZTRUA6OYR1.wav

sunil3590 commented 6 years ago

@zhao-xin I'm facing the exact same problem. I am working with call recording. Were you able to fix this?

zhao-xin commented 6 years ago

@zhao-xin I'm facing the exact same problem. I am working with call recording. Were you able to fix this?

@sunil3590 I feel this is not an engineering issue. The acoustic model is not trained with phone call conversations, the same as the language model, am I right?

We plan to collect our own data to tune deep speech models to make it can be used in the real world.

f90 commented 6 years ago

Are there any updates on this? I still have this issue and I am pretty sure it's not the models fault, since with normal decoding (greedy or beam search) I never get these very long words.

This is a big problem for me since those long words mess up the evaluation obviously, but a language would be necessary to get acceptable performance.

lissyx commented 6 years ago

@reuben is currently working on moving to ctcdecode, which amongst others should fix this issue

reuben commented 6 years ago

Could anyone who's seeing this issue test the new decoder on master?

There's native client builds here: https://tools.taskcluster.net/groups/FyclewklSUqN6FXHavrhKQ

The acoustic model is the same as v0.2, and the trie is in data/lm/trie.ctcdecode after you update to latest master. Testing with some problematic examples I had shows much better results, but the links in this thread are all broken so I couldn't test with your files.

Let me know how it goes.

reuben commented 6 years ago

Sorry, those instructions are incorrect. The acoustic model is the same as v0.2 but you need to re-export it with the master code. Alternatively you can grab it from here: https://github.com/reuben/DeepSpeech/releases/tag/v0.2.0-prod-ctcdecode

spencer-brown commented 6 years ago

@reuben 's new work is working well for me on long, clean recordings.

I'm using:

The new Node module for OS X (npm linked locally).
output_graph.pbmm from Reuben's release (as linked above).
The ctcdecode trie from here.

The inference for a 45s podcast snippet seems pretty decent:

why early on in the night i mean i think there are a couple of states that are going to be really keep kentucky and virginia kentucky closes its poles a half in the eastern times on half in the central time on so that means that half of the states at six o'clock to visit seven o'clock and so have a lot of results and in watching one particular congressional district raciness district between antibarbarus i disengaged morabaraba a republican and this is a race that really should not be on the map this is a race that should be republican territory and if this race is a searching for much of the night in the democratizing well there that's a pretty good sign that the wave will be building

The inference for two recordings I made myself is almost totally wrong, but does not have incorrectly dropped spaces. I'm guessing the poor results are due to recording quality?

12s recording made with Bose QC35 II

he gravitationless theocratic circuitously manipulate intermediately creation of images and a frame buffer intended for alcohol

11s recording made with mid-'13 Macbook Air built-in mic

a gravitational latrocinia idly manipulate an alternator exploration of images in a frame of her intolerable

kdavis-mozilla commented 6 years ago

@spencer-brown On the recordings you made yourself, did you record directly to 16KHz, 16bit, mono audio? (The recordings sound like they were made at a lower Hz and/or bit depth.)

Also, I'd tend to agree that the drop in the recording quality is likely largely to blame for the poor results on the recordings you made yourself. We're currently training models that will be more robust to background noise.

spencer-brown commented 6 years ago

Ah, no, I did not - thanks! In follow-up tests using those settings I'm seeing about 50% accuracy with the Bose headphones and nearly 0% with the Macbook Air mic. The recordings are still sound crackly relative to the training recordings.

Re: background-noise-robust models - exciting!

f90 commented 6 years ago

For anyone else still having trouble with this, i was able to make it work in the end by installing pytorch along with the ctcdecode library and then using that on top of my existing code, worked right of the gate with a KenLM language model!

reuben commented 6 years ago

@f90 you shouldn't need PyTorch (or the ctcdecode library) to use the new native client, the decoder is built-in.

derekpankaew commented 6 years ago

I'm also experiencing the same issue, with words gluing together. I'm trying to run the new version as described by @spencer-brown above, but I'm experiencing some issues.

Deep Speech v0.3 is working on my system, but using the new version is throwing an error.

I'm using:

The new Node module here
The new ctcdecode trie here
The new output_graph.pbmm model here
lm.binary and alphabet from the previous v0.3 release

I downloaded the files, ran npm install, and then ran the command:

node client.js --audio="./--audios_for_testing/90secondtest.wav" --model="./output_graph.pbmm" --trie="./trie.ctcdecode" --lm="./deepspeech_models/lm.binary" --alphabet="./deepspeech_models/alphabet.txt"

This is the output:

Loading model from file ./output_graph.pbmm
TensorFlow: v1.11.0-11-gbee825492f
DeepSpeech: unknown
2018-11-03 17:35:42.654139: I tensorflow/core/platform/cpu_feature_guard.cc:141] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2 FMA
dyld: lazy symbol binding failed: Symbol not found: __ZN2v87Isolate19CheckMemoryPressureEv
  Referenced from: /Users/derekpankaew/Dropbox/Javascript Programming/speech_recognition/lib/binding/v1.0.0/darwin-x64/node-v57/deepspeech.node
  Expected in: flat namespace

dyld: Symbol not found: __ZN2v87Isolate19CheckMemoryPressureEv
  Referenced from: /Users/derekpankaew/Dropbox/Javascript Programming/speech_recognition/lib/binding/v1.0.0/darwin-x64/node-v57/deepspeech.node
  Expected in: flat namespace

Abort trap: 6

Would love to get the new version to work - any thoughts?

lissyx commented 6 years ago

Would love to get the new version to work - any thoughts?

Your output shows that it's not an official build. Please use official ones before reporting issues. and please give more context on your system.

hugorichard commented 6 years ago

The binary files and trie in https://github.com/mozilla/DeepSpeech/tree/master/data/lm alleviates this long-word problem. However my results are not as good as @spencer-brown for the same text.

I apply deepspeech with binary files and trie mentionned above (all the rest is just straight application of instructions in "Using the model" of https://github.com/mozilla/DeepSpeech)

Using ffmpeg to change the sampling rate to 16000 ffmpeg -i midterm-update-clipped.wav -acodec pcm_s16le -ac 1 -ar 16000 midterm-update-clipped2.wav

I get the following transcription for the 45 seconds podcast mentionned above (https://drive.google.com/file/d/1rmje0llC-PXJgTiAiuQcsPRSjaaWfsv_/view?usp=sharing):

Loading model from file models/output_graph.pbmm TensorFlow: v1.11.0-9-g97d851f DeepSpeech: v0.3.0-0-gef6b5bd Loaded model in 0.013s. Loading language model from files models/lm2.binary models/trie2 Loaded language model in 0.000145s. Running inference. why early on in the night i mean i think there are a couple states that are going to be really keep can tucky and virginia contucky closes its poles a half in te the eastern times own half an the sentral times on so that means that half of the states at six o'clock afh o vis had seven o'clock ah and joll have a lot of results and an waschings one particular congressional district race o six congressinal district between a andi bar and maganme graph i ad bis emmigrass the democrat bars in combent a republican and this is a race that really should not be on the map this is a race that should be republican territory and if this race is a u seem magrapha leading for much of the night and de democratis doing well there that's a pretty good sign that the wave will be building Inference took 41.082s for 48.489s audio file.

Whenusing bandfilter: ffmpeg -i midterm-update-clipped.wav -acodec pcm_s16le -ac 1 -ar 16000 -af lowpass=3000,highpass=200 midterm-update-clipped3.wav I get a slightly better transcription:

TensorFlow: v1.11.0-9-g97d851f DeepSpeech: v0.3.0-0-gef6b5bd Loaded model in 0.0128s. Loading language model from files models/lm2.binary models/trie2 Loaded language model in 0.000105s. Running inference. why early on in the night i mean i think there are a couple states that are going to be really keep can tucky and virginia contucky closes its poles a half in the the eastern times own half in the central times on so that means that half of the states it six o'clock atfe vits ad seven o'clock ah and toll have a lot of results and an waschings one particular congressional district race o six congressial district between a andi bar and maganmc graph i ed es emmograss the democrat bars in combent a republican and this is a race that really should not be on the map this is a race that should be republican territory and if this race is a you seem mograph leading for much of the night and te democratis doing well there thats a pretty good sign that the wave will be building Inference took 43.352s for 48.489s audio file.

If anyone knows tricks to further improve results I would be really interested :)

mozilla / DeepSpeech

Language model incorrectly drops spaces for out-of-vocabulary words #1156

ta1jin3ping2iiiao1bu4de5li4liang4zai4iiiong3dao4shang4xia4fan1teng2iiiong3dong4she2xing2zhuang4ru2hai3tun2iii4zhix2iii3iii4tou2de5iiiu1shix4ling3xian1