ottokart / punctuator2

A bidirectional recurrent neural network model with attention mechanism for restoring missing punctuation in unsegmented text
http://bark.phon.ioc.ee/punctuator
MIT License
657 stars 195 forks source link

Change .PERIOD ,COMMA to . and , #80

Open campConservative opened 1 year ago

campConservative commented 1 year ago

I am using your punctuator for a school project I’m working on and I’m not good at python of ML. I realized when I punctuate my text it comes as follows:

“this is ninety nine percent invisible ,COMMA i'm ,COMMA roman mars .PERIOD it started with a place called the stone wall in gay bars had been raided by police for decades ,COMMA......

I tried to change the code below this way from data.py:

PUNCTUATION_VOCABULARY = [SPACE, ",", ".", "?", "!", ":", ";", "-"]
#PUNCTUATION_MAPPING = {}

But still comes up as ,COMMA, .PERIOD Any help how can I fix this to show only , and .? The demo site only shows , and .

Thank you

ottokart commented 1 year ago

You can just use a simple post-processing to convert the output to a more readable format:

sed -e 's/ ,COMMA/\,/g;s/ .PERIOD/\./g;s/ ?QUESTIONMARK/\?/g;s/ !EXCLAMATIONMARK/\!/g;s/ :COLON/\:/g;s/ ;SEMICOLON/\;/g;s/ -DASH/ \-/g' text.txt > text.clean.txt

where text.txt is the raw output with .PERIOD etc. and text.clean.txt is the clean output.

campConservative commented 1 year ago

Ok yes this makes sense. Once I have the punctuation in place it's easy to convert, thanks for your help!