Open ramin-git opened 9 years ago
Hi!
Our server parameters are RAM 32GB, 8 core CPU, 3.4 GHz, 3GB GPU memory. When I train the model with NCE, -use-cuda 1, without MaxEnt it takes 43-49 seconds during computing valid entropy for each 500 sentences. The command is 'taskset -c 0,1,2,3,4,5,6,7 ../rnnlm -rnnlm model-test -train train-no-one-word-new-uniq-random-dict.txt -valid valid-new-uniq.txt -hidden 200 -hidden-type sigmoid -nce 20 -nce-accurate-test 1 -use-cuda 1 -threads 8 -alpha 0.01 -rmsprop 0.9 -bptt 4 -bptt-skip 10'. And without -use-cuda it takes 43-49 seconds, too. But training without NCE, with/without MaxEnt takes only 0.05-0.07 seconds. Is it normal?
That's weird. Does the rnnlm actually learn anything in less then a second? Does valid entropy decrease?
I could not good explain problem. "But training without NCE, with/without MaxEnt takes only 0.05-0.07 seconds. Is it normal?" I want say training takes only 0.05-0.07 seconds for computing valid entropy for each 500 sentences. Not full traning takes 0.05-0.07 seconds
Yeap, that's normal. Validation for Hierarchical Softmax is a few orders faster than for HS.
The problem with NCE is that nobody guarantees that predicted probabilites would be stochastic, i.e. that sum of probabilities for all words would be one. That's why validation for NCE is so extremely slow - we have to renormalize probabilities. On the other hand, predicted probabilities are quite close to real one. So, if you need probabilities for some kind of rescoring, you can disable nce-accurate-test at test time and compute approximated probabilities very fast.
Hi! I read all your answers to the questions asked. I want to say thanks for your answers. But I need some detailed explaination. I will ask a question, may be repeated. The question is about training rnnlm using CUDA. I understand that, CUDA is used only to compute validation entropy during training in NCE mode. Why is not CUDA used during training not only for validation entropy? Is it impossible to use CUDA during training at least for matrix operations? Could you explain if you have time?
NCE validation uses only simple operations (like matrix multiplication) and could be efficiently implemented on GPU.
As for training, some operations works faster on GPU (matrix multiplication) and some operations works faster on CPU (HS). CPU-based solution allows to use HogWild and that makes it faster then GPU-based for hidden layers or reasonable size. However, some combination of GPU & CPU may work faster. I haven't tried this yet.
Hi, Everyone. I have some questions about faster-rnnlm. First question, I want to use this toolkit with option -direct 1000, but there appears error: CUDA ERROR: Failed to allocate cuda memory for maxent out of memory I know it is due to -direct 1000, because when i use -direct 400 this error doesn't appear. Our GPU memory is 3GB. Does there exist any way to use -direct 1000 without error? Second question, you use GPU when computing only validation or test entropy. Why don't you use GPU during training? What is the reason? Don't you think it would be faster to use GPU during training?