Open 350d opened 6 years ago
Hello! I've successfully collect training data and created my model for specific language. Now I have a problem where network can't handle simple language rules, like comma before some specific word or column in between other words. How I can predefine some custom rules for this model? Thank you!
Hi! If you have sufficient amount of decent quality training data (10M - 40M words), then the model should be able to learn most of the rules on its own (although, it will of cource make some mistakes). This toolkit currently does not support manually created custom rules, so you would have to write this extension yourself. You can also try to train some other probabilistic model(s) and interpolate their probabilities with the output of this model (this is also a custom extension).
Best, Ottokar
Is there an upper limit for the training data, or the more the better?
Hello! I'm trying to add support for different language here. I have training data with about 100 000 sentences and can increase it to 1M or so. How many sentences I need to start training and how I need to update
./run.sh
file in my case (input file name updated already)? (I've tried to use total and half number of lines already and got these errors:Update: ok, OSX
head
andtail
don't accept negative values, fixed with update tocoreutils
. Will let you known about my progress here...