ottokart / punctuator2

A bidirectional recurrent neural network model with attention mechanism for restoring missing punctuation in unsegmented text
http://bark.phon.ioc.ee/punctuator
MIT License
659 stars 195 forks source link

post punctuator steps? #12

Open cozec opened 7 years ago

cozec commented 7 years ago

Question about steps after this: cat data.dev.txt | python punctuator.py

We get a text file have the result with ',COMMA' and '.PERIOD' etc inside. To generate final result, we assume following steps:

  1. replace punc with real punc
  2. Capitalize the previous word after .PERIOD'

Is this the right understanding?

ottokart commented 7 years ago

Yes, that's about correct (?QUESTIONMARK and !EXCLAMATIONMARK should also be taken into account).

I added a conversion script with the last commit. You can use it like this: python convert_to_readable.py <model_output_path> <readable_output_path> <1/0 - add newlines at end-of-sentence>

cozec commented 7 years ago

Thanks a lot! Just added one more 'period' at the last line.

acerock6 commented 6 years ago

@cozec can you help me train a model with a processed file that I have? Thanks.