ottokart / punctuator2

A bidirectional recurrent neural network model with attention mechanism for restoring missing punctuation in unsegmented text
http://bark.phon.ioc.ee/punctuator
MIT License
659 stars 195 forks source link

Sample data files #3

Closed roozbehsadeghian closed 7 years ago

roozbehsadeghian commented 7 years ago

Hi,

I was trying to run the code but I was stucked in reading the data files (train, dev). Is it possible for you to provide some sample data?

ottokart commented 7 years ago

Hi,

There's an example in the README.md of how the sentences should look like in your dataset: Example: to be ,COMMA or not to be ,COMMA that is the question .PERIOD You might run into problems if the test dataset has too few words in it. If your minibatch size is 128 and sequence length 200 (the default values) then training, development and test datasets should have at least 25600 words each.

ottokart commented 7 years ago

See ./example directory for an example dataset