vackosar / keras-punctuator

Experimental project to punctuate text using a embedding layer, single convolutional layer and output softmax layer written in Keras.
MIT License
83 stars 16 forks source link

Training data size #5

Closed BoPengGit closed 6 years ago

BoPengGit commented 6 years ago

Hi Vaclav,

What was the training data size to train the final model?

It say here:

Europarl v7 http://hltshare.fbk.eu/IWSLT2012/training-parallel-europarl.tgz News Crawl from WMT 2012 (en, fr), 7GB http://hltshare.fbk.eu/IWSLT2012/training-monolingual-newsshuffled.tgz Additional data http://hltc.cs.ust.hk/iwslt/index.php/evaluation-campaign/ted-task.html

For news crawl, is 7GB the size of just the english text alone?

Thanks.

vackosar commented 6 years ago

I used the news crawl for android app Youtube reader. I don't remember the size. I think it model converged fast not using all the data.

vackosar commented 6 years ago

I mean that, I used only part of the data and it converged.