Open maniacally opened 7 years ago
Hi and thank you! The pause annotation format looks correct, but I currently don't have any English models trained on pause annotated data, because I just don't have the right kind of data at hand. So the current models in Google Drive have only been trained for the first stage and are unable to use pauses.
PS! If you have sufficient amount (at least 1 million words, but more is better) of pause annotated data, then you can use the first stage model from Google Drive and just train the second stage.
How can we use the demo-model to punctuate the text as in the demo project?
Excellent library, thanks for your efforts. I'm attempting to punctuate speech-to-text output of multi-party conversations. Was excited to see that your library supports pause durations during second-phase training. How would one query the model providing this information? I've tried passing in the following but it doesn't seem to produce more successful output:
to <sil=0.000> be <sil=1.000> or <sil=0.000> not <sil=0.000> to <sil=0.000> be <sil=1.000> that <sil=0.000> is <sil=0.000> the <sil=0.000> question <sil=1.000>
n.b. I'm using your demo model that you provided in google drive