ottokart / punctuator2

A bidirectional recurrent neural network model with attention mechanism for restoring missing punctuation in unsegmented text
http://bark.phon.ioc.ee/punctuator
MIT License
659 stars 195 forks source link

How to pass in pause durations when querying model #11

Open maniacally opened 7 years ago

maniacally commented 7 years ago

Excellent library, thanks for your efforts. I'm attempting to punctuate speech-to-text output of multi-party conversations. Was excited to see that your library supports pause durations during second-phase training. How would one query the model providing this information? I've tried passing in the following but it doesn't seem to produce more successful output:

to <sil=0.000> be <sil=1.000> or <sil=0.000> not <sil=0.000> to <sil=0.000> be <sil=1.000> that <sil=0.000> is <sil=0.000> the <sil=0.000> question <sil=1.000>

n.b. I'm using your demo model that you provided in google drive

ottokart commented 7 years ago

Hi and thank you! The pause annotation format looks correct, but I currently don't have any English models trained on pause annotated data, because I just don't have the right kind of data at hand. So the current models in Google Drive have only been trained for the first stage and are unable to use pauses.

PS! If you have sufficient amount (at least 1 million words, but more is better) of pause annotated data, then you can use the first stage model from Google Drive and just train the second stage.

MeteorBurn commented 6 years ago

How can we use the demo-model to punctuate the text as in the demo project?