Closed yongduek closed 7 years ago
We haven't added functional GPU support to the final version of the code, but I remember that in initial experiments, the GPU-based code didn't give any significant speed-ups since the main bottleneck lies in the recurrent (LSTM) modules. However, this might not be the case anymore with recent improvements to the libraries.
The error you report is because the call targets:cuda()
is invalid since targets
is a table and not a torch tensor. You can fix this by replacing line 255
in NeuralQLearner.lua
with if self.gpu >= 0 then targets = {targets[1]:cuda(), targets[2]:cuda()} end
.
There might be other places that require similar changes. If you end up fixing them, please consider submitting a pull request!
The script run_cpu runs well to learn the game policy. Normally, it should be better to use GPU for learning, so, I changed the gpu option in the file 'run_cpu': gpu = 0 It seems that this option does not work. Is there a way to use GPU for the learning process, please?
Below is what I have tried.
Setting gpu=0 produced only a error as follows.
... state dim multiplier 1
/Users/yndk/torch/install/bin/luajit: /Users/yndk/torch/install/share/lua/5.1/torch/Tensor.lua:238: attempt to index a nil value stack traceback: /Users/yndk/torch/install/share/lua/5.1/torch/Tensor.lua:238: in function 'type' /Users/yndk/torch/install/share/lua/5.1/nn/utils.lua:52: in function 'recursiveType' /Users/yndk/torch/install/share/lua/5.1/nn/Module.lua:126: in function 'type' /Users/yndk/torch/install/share/lua/5.1/nn/utils.lua:45: in function 'recursiveType' /Users/yndk/torch/install/share/lua/5.1/nn/utils.lua:41: in function 'recursiveType' /Users/yndk/torch/install/share/lua/5.1/nn/Module.lua:126: in function 'type' /Users/yndk/torch/install/share/lua/5.1/nn/utils.lua:45: in function 'recursiveType' /Users/yndk/torch/install/share/lua/5.1/nn/utils.lua:41: in function 'recursiveType' /Users/yndk/torch/install/share/lua/5.1/nn/Module.lua:126: in function 'cuda' ./lstm_embedding.lua:149: in function 'network' ./NeuralQLearner.lua:90: in function '__init' /Users/yndk/torch/install/share/lua/5.1/torch/init.lua:91: in function </Users/yndk/torch/install/share/lua/5.1/torch/init.lua:87> [C]: at 0x0a6b3250 agent.lua:158: in main chunk [C]: in function 'dofile' ...yndk/torch/install/lib/luarocks/rocks/trepl/scm-1/bin/th:145: in main chunk
[C]: at 0x010a3a9bd0
* Now, $ run_cpu 1 went through the first learning loop. But, as soon as the second loop started, it produced error messages:
Network weight sum: 131.26902770996 Saved: logs/run1/DQN.t7
/Users/yndk/torch/install/bin/luajit: ./NeuralQLearner.lua:255: attempt to call method 'cuda' (a nil value)p: 17ms
stack traceback: ./NeuralQLearner.lua:255: in function 'getQUpdate' ./NeuralQLearner.lua:268: in function 'qLearnMinibatch' ./NeuralQLearner.lua:412: in function 'perceive' agent.lua:200: in main chunk [C]: in function 'dofile' ...yndk/torch/install/lib/luarocks/rocks/trepl/scm-1/bin/th:145: in main chunk
[C]: at 0x010f36ebd0