ottokart / punctuator2

A bidirectional recurrent neural network model with attention mechanism for restoring missing punctuation in unsegmented text
http://bark.phon.ioc.ee/punctuator
MIT License
659 stars 195 forks source link

How to run pre-trained model in local cpu #24

Open saisrinivas047 opened 6 years ago

saisrinivas047 commented 6 years ago

Hi @ottokart I am new to this. I downloaded the .pcl pre-trained model. Can someone tell me how to use this file to add puntuations to text

studiawan commented 6 years ago

As described in README, try this:

cat input.txt | python punctuator.py <model_path> <model_output_path>

Change input.txt to your input text file that will be punctuated. Change <model_path> to the location where you save the pre-trained .pcl model file. Then change <model_output_path> to desired punctuated output file.

acerock6 commented 6 years ago

I am getting some gibberish output when I try to run the model on my input file. The script runs with some random output as you can see in the image. Can you help me run the model on my training data? The README doesn't help much for beginners. image

Or can you give me a sample input for the and ? Do these paths need to have the file name as well ?

chrisspen commented 4 years ago

@acerock6 I got it working. I cloned the repo and created the folder ./data. in my project root, and then downloaded the file Demo-Europarl-EN.pcl into my ./data folder.

I then created a sample file called test.txt and added some unpunctuated text. Then I ran:

cat test.txt | python punctuator.py ./data/Demo-Europarl-EN.pcl output.txt

It ran for about 5 minutes, but finally generated the file output.txt containing my sample text, but with punctutation symbols added to it. Note, this output isn't strictly readable. It just inserts strings like ".PERIOD" into your text to denote a period. To convert this to the final form, you need to run python convert_to_readable.py output.txt output2.txt to convert the punctuation symbols to normal punctuation.