Open cjjjy opened 5 years ago
I've got the same problem. I just started to reproduce the 1-billion-word results (I get the same md5sums).
$HOME/faster-rnnlm/faster-rnnlm/rnnlm --hidden 128 --hidden-type gru-insyn --nce 50 --direct-order 4 --direct 1000 --alpha 0.05 --bptt-skip 8 --bptt 32 -rnnlm h128 --train train.txt --valid valid.txt --threads 12
Constructed a vocabulary: 793470 words
Constructing a new net (no model file is found)
Constructing RNN: layer_size=128, layer_type=gru-insyn, layer_count=1, maxent_hash_size=999772200, maxent_order=4, vocab_size=793470, use_nce=1
Constructing NCE: layer_size=128, maxent_hash_size=999772200, cuda=1, ln(Z)=9.000000
Constructed UnigramNoiseGenerator: power=1.000, mincells(param)=5.000, mincells(real)=5
Initial entropy (bits) valid: 19.59778
Epoch 1 lr: 5.00e-02/1.00e-01 progress: 99.65% 152.31 Kwords/sec entropy (bits) valid: nan elapsed: 87.7m+3.9m Awful: Nnet rejected`
EDIT: Probing further, this is a very common problem with this software, perhaps the most helpful reply has been in https://github.com/yandex/faster-rnnlm/issues/17
ALSO: My run above wasn't quite fair, I had --hidden 128 and the example parameters were for --hidden 256. It's now run and exceded the published result of 6.426476.
$HOME/faster-rnnlm/faster-rnnlm/rnnlm --hidden 256 --hidden-type gru-insyn --nce 50 --direct-order 4 --direct 1000 --alpha 0.05 --bptt-skip 8 --bptt 32 -rnnlm h256 --train train.txt --valid valid.txt --threads 12
Constructed a vocabulary: 793470 words
Constructing a new net (no model file is found)
Constructing RNN: layer_size=256, layer_type=gru-insyn, layer_count=1, maxent_hash_size=999772200, maxent_order=4,
vocab_size=793470, use_nce=1
Constructing NCE: layer_size=256, maxent_hash_size=999772200, cuda=1, ln(Z)=9.000000
Constructed UnigramNoiseGenerator: power=1.000, mincells(param)=5.000, mincells(real)=5
Initial entropy (bits) valid: 19.59782
Epoch 1 lr: 5.00e-02/1.00e-01 progress: 99.27% 44.16 Kwords/sec entropy (bits) valid: 6.82076 elapsed: 302.7m+4.1m
Epoch 2 lr: 5.00e-02/1.00e-01 progress: 99.86% 44.06 Kwords/sec entropy (bits) valid: 6.70054 elapsed: 302.8m+4.0m
Epoch 3 lr: 5.00e-02/1.00e-01 progress: 99.90% 43.98 Kwords/sec entropy (bits) valid: 6.65609 elapsed: 303.3m+4.1m
Epoch 4 lr: 5.00e-02/1.00e-01 progress: 99.92% 44.03 Kwords/sec entropy (bits) valid: 6.61038 elapsed: 303.0m+4.1m
Epoch 5 lr: 5.00e-02/1.00e-01 progress: 99.75% 44.04 Kwords/sec entropy (bits) valid: 6.57961 elapsed: 303.3m+4.1m
Epoch 6 lr: 5.00e-02/1.00e-01 progress: 99.63% 43.62 Kwords/sec entropy (bits) valid: 6.56623 elapsed: 305.9m+4.1m Bad: start lr decay
Epoch 7 lr: 2.50e-02/5.00e-02 progress: 99.52% 44.28 Kwords/sec entropy (bits) valid: 6.42066 elapsed: 301.7m+4.1m
Epoch 8 lr: 1.25e-02/2.50e-02 progress: 99.80% 44.11 Kwords/sec entropy (bits) valid: 6.36528 elapsed: 302.5m+4.1m
Epoch 9 lr: 6.25e-03/1.25e-02 progress: 98.17% 44.18 Kwords/sec entropy (bits) valid: 6.34672 elapsed: 302.8m+4.2m Bad: 1 more to stop
Epoch 10 lr: 3.13e-03/6.25e-03 progress: 99.97% 44.05 Kwords/sec entropy (bits) valid: 6.38948 elapsed: 302.4m+4.1m Awful: Nnet rejected
I try running the command , ./rnnlm -rnnlm model_name -train train.txt -valid valid.txt -hidden 256 -hidden-type gru -nce 20 -alpha 0.01 and it occur the error: entropy (bits) valid: -nan elapsed: 29.3s+0.1s Awful: Nnet rejected
how can I solve the problem?