ottokart / punctuator2

A bidirectional recurrent neural network model with attention mechanism for restoring missing punctuation in unsegmented text
http://bark.phon.ioc.ee/punctuator
MIT License
657 stars 195 forks source link

data set #54

Open cymqqqq opened 4 years ago

cymqqqq commented 4 years ago

what kind of dataset you use in this paper can you give me a link of the dataset? THX

ottokart commented 4 years ago

The TED dataset was preprocessed by the authors of http://www.lrec-conf.org/proceedings/lrec2016/pdf/103_Paper.pdf and the resulting dataset is shared at: https://drive.google.com/file/d/0B13Cc1a7ebTuMElFWGlYcUlVZ0k/view I used this simple script to convert the format of the files: https://drive.google.com/open?id=1sW23C4kqRJ6rDSBurco8_0lJ3VZJIkta

cymqqqq commented 4 years ago

Awesome! thanks!

aoao1992 commented 4 years ago

when I run python data.py data/ line 206 exists ZeroDivisionError:division by zero. it‘s strange,something went wrong? ask for help=-=