Closed sridharmahadevan closed 9 years ago
I think this is duplicate of Issue #3. When I get a few minutes I will update the Readme to emphasize that this package requires recent github versions of Theano, Pylearn2 and ALE. I've considered forking those packages to provide a set of known-working dependencies, but I would rather keep up-to-date by fixing issues as they arise.
Yes, I just saw that and rebuilt Theano. I've started another run now...looks like it is finally working!
Thanks for your response!
python ale_run.py --exp_pref data RL-Glue Version 3.04, Build 909 RL-Glue is listening for connections on port=4096 A.L.E: Arcade Learning Environment (version 0.4.4) [Powered by Stella] Use -help for help screen. Warning: couldn't load settings file: ./stellarc Game console created: ROM file: /home/mahadeva/Documents/code/deep_rl/roms/breakout.bin Cart Name: Breakout - Breakaway IV (1978) (Atari) Cart MD5: f34f08e5eb96e500e851a80be3277a56 Display Format: AUTO-DETECT ==> NTSC ROM Size: 2048 Bankswitch Type: AUTO-DETECT ==> 2K
Running ROM file... Random Seed: Time Game will be controlled through RL-Glue. RL-Glue Python Experiment Codec Version: 2.02 (Build 738) Connecting to 127.0.0.1 on port 4096... RL-Glue :: Experiment connected. Initializing ALE RL-Glue ... RL-Glue :: Environment connected. Using gpu device 1: GeForce GTX 980 RL-Glue Python Agent Codec Version: 2.02 (Build 738) Connecting to 127.0.0.1 on port 4096... Agent Codec Connected RL-Glue :: Agent connected. (32, 4, 80, 80) (4, 80, 80, 32) (16, 19.0, 19.0, 32) (32, 9.0, 9.0, 32) (32, 32, 9.0, 9.0) (32, 256) (32, 18) /home/mahadeva/Downloads/Theano/theano/gof/cmodule.py:289: RuntimeWarning: numpy.ndarray size changed, may indicate binary incompatibility rval = import(module_name, {}, {}, [module_name])
OPENING data_01-14-15-14_0p0001_0p9/results.csv training epoch: 1 steps_left: 50000 Simulated at a rate of 55.4920442312/s Average loss: 0.100790430055 training epoch: 1 steps_left: 49780 Simulated at a rate of 58.814803058/s Average loss: 0.0572934835375 training epoch: 1 steps_left: 49597 Simulated at a rate of 60.2730953073/s Average loss: 0.0623961077461 training epoch: 1 steps_left: 49271 Simulated at a rate of 53.5896811515/s Average loss: 0.0776437076656 training epoch: 1 steps_left: 49144
On 01/14/2015 10:03 AM, Nathan Sprague wrote:
I think this is duplicate of Issue #3 https://github.com/spragunr/deep_q_rl/issues/3. When I get a few minutes I will update the Readme to emphasize that this package requires recent github versions of Theano, Pylearn2 and ALE. I've considered forking those packages to provide a set of known-working dependencies, but I would rather keep up-to-date by fixing issues as they arise.
— Reply to this email directly or view it on GitHub https://github.com/spragunr/deep_q_rl/issues/6#issuecomment-69928497.
I ran into a bit of difficulty doing a plot with the results.csv file. Here is the output. Any suggestions?
Thanks,
ls -l data_01-14-15-14_0p0001_0p9/
total 17168
-rw-rw-r-- 1 mahadeva mahadeva 15754 Jan 14 10:44 learning.csv
-rw-rw-r-- 1 mahadeva mahadeva 17558418 Jan 14 10:29 network_file_1.pkl
-rw-rw-r-- 1 mahadeva mahadeva 95 Jan 14 10:31 results.csv
mahadeva@manifold:~/Documents/code/deep_rl/deep_q_rl$ python
plot_results.py data_01-14-15-14_0p0001_0p9/results.csv
Traceback (most recent call last):
File "plot_results.py", line 22, in
On 01/14/2015 10:03 AM, Nathan Sprague wrote:
I think this is duplicate of Issue #3 https://github.com/spragunr/deep_q_rl/issues/3. When I get a few minutes I will update the Readme to emphasize that this package requires recent github versions of Theano, Pylearn2 and ALE. I've considered forking those packages to provide a set of known-working dependencies, but I would rather keep up-to-date by fixing issues as they arise.
— Reply to this email directly or view it on GitHub https://github.com/spragunr/deep_q_rl/issues/6#issuecomment-69928497.
I haven't seen that before. If you post the contents of results.csv (should be at most a few lines long), I can take a look at it.
Never mind, it works fine. I think I was trying to plot the results.csv file when it only had two data points. That causes the error I sent you. Now, it has 4 data points and it plots fine (see attached).
On 01/14/2015 11:09 AM, Nathan Sprague wrote:
I haven't seen that before. If you post the contents of results.csv (should be at most a few lines long), I can take a look at it.
— Reply to this email directly or view it on GitHub https://github.com/spragunr/deep_q_rl/issues/6#issuecomment-69940381.
Great! The README has been updated.
Two additional questions: I noticed it is somewhat low on cpu usage. I'm using a beefy 16 processor machine. I've set Theano to use all 16 threads. Anything else I can do to speed it up?
Also, I have two GPUs. Any way to use both? Theano has a flag that tells it which GPU to use. I wonder if it can be made to use both.
....Sent from my iPhone
On Jan 14, 2015, at 12:13 PM, Nathan Sprague notifications@github.com wrote:
Closed #6.
— Reply to this email directly or view it on GitHub.
I haven't done any benchmarking for quite a while, but my recollection is that most of the bottlneck is on the GPU in Theano, and most of that time is spent computing the convolutions. This is followed by ALE (single thread), and overhead associated with RLGlue. There isn't much for Theano to parallelize that isn't already running on the GPU.
I don't have any experience trying to get multiple GPUs working with Theano. I'm skeptical that there would be a huge speedup. I've run this code on several different GPUs with widely varying numbers of cores. The performance seems to track the frequency of the GPU more than the number of available cores.
You might get some speedup by trying out some of the more recent convolution implementations that are being integrated into Theano -- in particular cuDNN. Let me know if you have any luck.
-Nathan
OK, I realized I had not downloaded cuDnn, so it's possible that Theano will go faster with that. I just downloaded and installed it from nVidia.
My latest problem is that I can't get the SDL display to work. I've compiled and installed the SDL libraries, but it doesn't find the header files SDL.h (I've put the necessary include file -I/place/where/SDL/is in the makefile, but nada so far.
On 01/14/2015 01:39 PM, Nathan Sprague wrote:
I haven't done any benchmarking for quite a while, but my recollection is that most of the bottlneck is on the GPU in Theano, and most of that time is spent computing the convolutions. This is followed by ALE (single thread), and overhead associated with RLGlue. There isn't much for Theano to parallelize that isn't already running on the GPU.
I don't have any experience trying to get multiple GPUs working with Theano. I'm skeptical that there would be a huge speedup. I've run this code on several different GPUs with widely varying numbers of cores. The performance seems to track the frequency of the GPU more than the number of available cores.
You might get some speedup by trying out some of the more recent convolution implementations http://deeplearning.net/software/theano/library/tensor/nnet/conv.html that are being integrated into Theano -- in particular cuDNN. Let me know if you have any luck.
-Nathan
— Reply to this email directly or view it on GitHub https://github.com/spragunr/deep_q_rl/issues/6#issuecomment-69967163.
When you make Atari Learning Environment, did you set the flag in makefile.unix
?
If you see page 11 of the manual that is enclosed with the ALE code it says,
To compile with SDL support, you should set USE_SDL=1 in the ALE makefile.
Hope that helps?
When choosing a trainned network to watch you'd probably be best to train for 100 epochs, and then choose the .pkl file which had the highest per episode score. For Breakout, a NN with average score over 60 would be fun to watch.
Yes, that's the problem. I can compile Ale with the USE_SDL flag set to 0. If I set this flag to 1, compilation fails because it can't find SDL.h (whose location I have set in the INCLUDES directive in the Makefile).
On Jan 14, 2015, at 7:57 PM, Ajay Talati notifications@github.com wrote:
When you make Arcade Learning Environment, did you set the flag in makefile.unix ?
If you see page 11 of the manual that is enclosed with the ALE code it says,
To compile with SDL support, you should set USE_SDL=1 in the ALE makefile.
Hope that helps?
— Reply to this email directly or view it on GitHub.
OK, everything works fine now! I had to install some missing SDL packages, and also add some INCLUDE links.
The program's been running for a day now, and it does not seem to have learned a very good policy for breakout. I'll give it another day and see. The learning definitely needs to be accelerated, perhaps with some parameter tuning.
Thanks for all your help!
On 01/14/2015 07:57 PM, Ajay Talati wrote:
When you make Arcade Learning Environment, did you set the flag in |makefile.unix| ?
If you see page 11 of the manual that is enclosed with the ALE code it says,
|To compile with SDL support, you should set USE_SDL=1 in the ALE makefile.|
Hope that helps?
— Reply to this email directly or view it on GitHub https://github.com/spragunr/deep_q_rl/issues/6#issuecomment-70022932.
In my synaptic package manager, it shows I have the dev libraries installed for
• libsdl2 • libsdl2-gfx • libsdl2-image
Was it those libraries which were missing?
Yes, except I installed the SDL versions of those, not SDL2. The code is set up for SDL, otherwise you have to go and change all the includes from "SDL/SDL.h" to "SDL2/SDL.h".
On 01/15/2015 08:10 AM, Ajay Talati wrote:
In my synaptic package manager, it shows I have the dev libraries installed for
• libsdl2 • libsdl2-gfx • libsdl2-image
Was it those libraries which were missing?
— Reply to this email directly or view it on GitHub https://github.com/spragunr/deep_q_rl/issues/6#issuecomment-70083166.
So, just to clear things up, I think/guess you need to have the following installed,
• libsdl1.2-dev • libsdl-gfx1.2-dev • libsdl-image1.2-dev
with those maybe you don't need to change any INCLUDE links? Hopefully that will help someone in the future?
I'd be really interested in your parameter tunings if you did manage to find a set which gave you faster learning.
Thanks, Ajay
Discount rate of 0.95 helps quite a bit. (Need a forum or something. An issue tracker on github doesn't seem like the right place for these conversations)
I started another run since I now have cuDNN installed. It makes a difference since the game is now being simulated at a rate of around 65 steps/second (which is quite a bit faster than before). I also increased the discount factor to 0.95. Will check on it in the evening to see how it's progressing.
On 01/14/2015 01:39 PM, Nathan Sprague wrote:
I haven't done any benchmarking for quite a while, but my recollection is that most of the bottlneck is on the GPU in Theano, and most of that time is spent computing the convolutions. This is followed by ALE (single thread), and overhead associated with RLGlue. There isn't much for Theano to parallelize that isn't already running on the GPU.
I don't have any experience trying to get multiple GPUs working with Theano. I'm skeptical that there would be a huge speedup. I've run this code on several different GPUs with widely varying numbers of cores. The performance seems to track the frequency of the GPU more than the number of available cores.
You might get some speedup by trying out some of the more recent convolution implementations http://deeplearning.net/software/theano/library/tensor/nnet/conv.html that are being integrated into Theano -- in particular cuDNN. Let me know if you have any luck.
-Nathan
— Reply to this email directly or view it on GitHub https://github.com/spragunr/deep_q_rl/issues/6#issuecomment-69967163.
I setup a forum, its WWW is,
http://deep-learning.boards.net/
If anyone wants to takeover/share the admin, that would be great?
@AjayTalati Thanks for taking the initiative to set that up. I'd prefer to use a google web forum, if only because that's the mechanism used by other projects related to this one. I've set something up at:
The results that I am getting seem quite a bit worse than the published NIPS paper. Mean Q after 30 episodes on breakout as shown in the paper is around 2. Mine are 10 times lower, around 0.2. Similarly, reward per epoch should be around 50 and stable. Mine is very unstable, hovering between 3 and 10.
Clearly, something is amiss. Anyone get better results than this?
On 01/15/2015 08:50 AM, Alejandro Dubrovsky wrote:
Discount rate of 0.95 helps quite a bit. (Need a forum or something. An issue tracker on github doesn't seem like the right place for these conversations)
— Reply to this email directly or view it on GitHub https://github.com/spragunr/deep_q_rl/issues/6#issuecomment-70087606.
30 epochs is rather too short a run, to say anything conclusive?
If you carry on to 100 epochs, your reward per episode score should get to around 60. Thats the 4th column in the results csv file.
Maybe Nathan can have a look at the headers in the csv file? I think they might be a labelling issue? It's line 200 of rl_glue_ale_agent.py
. The 4th column might be relabelled as reward_per_episode
instead of the current reward_per_epoch
? So I think it should be,
self.results_file.write(\
'epoch,num_episodes,total_reward,reward_per_episode,mean_q\n')
As for the mean_q
value I'm still unsure of how exactly to calculate that? So if you know how to do it, I'd like to understand?
Sridhar, perhaps we should transfer this conversation over to the forum Nathan has setup?
Yes, good idea. I posted my question there. After 80 episodes, my results are quite poor. It has not learned much, if anything.
On 01/15/2015 08:00 PM, Ajay Talati wrote:
Sridhar, perhaps we should transfer this conversation over to the forum Nathan has setup?
https://groups.google.com/forum/#!forum/deep-q-learning https://groups.google.com/forum/#%21forum/deep-q-learning
— Reply to this email directly or view it on GitHub https://github.com/spragunr/deep_q_rl/issues/6#issuecomment-70192143.
Hi,
I've been trying to get your code working, and I'm almost there, but I still am getting a seg fault. The system is running, but it's not saving out any results. Here is what I am getting when I run your script. Thanks for your help!
python ale_run.py --exp_pref data | more RL-Glue Version 3.04, Build 909 A.L.E: Arcade Learning Environment (version 0.4.4) [Powered by Stella] Use -help for help screen. Warning: couldn't load settings file: ./stellarc Game console created: ROM file: /home/mahadeva/Documents/code/deep_rl/roms/breakout.bin Cart Name: Breakout - Breakaway IV (1978) (Atari) Cart MD5: f34f08e5eb96e500e851a80be3277a56 Display Format: AUTO-DETECT ==> NTSC ROM Size: 2048 Bankswitch Type: AUTO-DETECT ==> 2K
Running ROM file... Random Seed: Time Game will be controlled through RL-Glue. RL-Glue Python Experiment Codec Version: 2.02 (Build 738) Connecting to 127.0.0.1 on port 4096... Initializing ALE RL-Glue ... Using gpu device 1: GeForce GTX 980 In file included from /usr/include/python2.7/numpy/ndarraytypes.h:1761:0, from /usr/include/python2.7/numpy/ndarrayobject.h:17, from /usr/include/python2.7/numpy/arrayobject.h:4, from /home/mahadeva/.pyxbld/temp.linux-x86_64-2.7/pyrex/shift.c :239: /usr/include/python2.7/numpy/npy_1_7_deprecated_api.h:15:2: warning: #warning "U sing deprecated NumPy API, disable it by " "#defining NPY_NO_DEPRECATEDAPI NPY 1_7_API_VERSION" [-Wcpp]
warning "Using deprecated NumPy API, disable it by " \
^ In file included from /usr/include/python2.7/numpy/ndarrayobject.h:26:0, from /usr/include/python2.7/numpy/arrayobject.h:4, from /home/mahadeva/.pyxbld/temp.linux-x86_64-2.7/pyrex/shift.c :239: /usr/include/python2.7/numpy/multiarray_api.h:1629:1: warning: ‘_import_array’ defined but not used [-Wunused-function] _import_array(void) ^ In file included from /usr/include/python2.7/numpy/ufuncobject.h:327:0, from /home/mahadeva/.pyxbld/temp.linux-x86_64-2.7/pyrex/shift.c --More--Traceback (most recent call last): File "./rl_glue_ale_agent.py", line 430, in
main()
File "./rl_glue_ale_agent.py", line 426, in main
AgentLoader.loadAgent(NeuralAgent())
File "/usr/local/lib/python2.7/dist-packages/rlglue/agent/AgentLoader.py", line 58, in loadAgent
client.runAgentEventLoop()
File "/usr/local/lib/python2.7/dist-packages/rlglue/agent/ClientAgent.py", line 144, in runAgentEventLoop
switchagentState
File "/usr/local/lib/python2.7/dist-packages/rlglue/agent/ClientAgent.py", line 137, in
Network.kAgentInit: lambda self: self.onAgentInit(),
File "/usr/local/lib/python2.7/dist-packages/rlglue/agent/ClientAgent.py", line 43, in onAgentInit
self.agent.agent_init(taskSpec)
File "./rl_glue_ale_agent.py", line 158, in agent_init
self.network = self._init_network()
File "./rl_glue_ale_agent.py", line 192, in _init_network
approximator='cuda_conv')
File "/home/mahadeva/Documents/code/deep_rl/deep_q_rl/cnn_q_learner.py", line 168, in __init
target = theano.gradient.consider_constant(target)
AttributeError: 'module' object has no attribute 'consider_constant'
Segmentation fault (core dumped)
:240:
/usr/include/python2.7/numpy/__ufunc_api.h:241:1: warning: ‘_import_umath’ defin
ed but not used [-Wunused-function]
_import_umath(void)
^
RL-Glue Python Agent Codec Version: 2.02 (Build 738)
Connecting to 127.0.0.1 on port 4096...
Agent Codec Connected
(32, 4, 80, 80)
(4, 80, 80, 32)
(16, 19.0, 19.0, 32)
(32, 9.0, 9.0, 32)
(32, 32, 9.0, 9.0)
(32, 256)
(32, 18)
training epoch: 1 steps_left: 50000
training epoch: 1 steps_left: 49995
training epoch: 1 steps_left: 49993
training epoch: 1 steps_left: 49991
training epoch: 1 steps_left: 49989
training epoch: 1 steps_left: 49987
training epoch: 1 steps_left: 49985
training epoch: 1 steps_left: 49983
training epoch: 1 steps_left: 49981
training epoch: 1 steps_left: 49979
training epoch: 1 steps_left: 49977
training epoch: 1 steps_left: 49975
training epoch: 1 steps_left: 49973
training epoch: 1 steps_left: 49971
training epoch: 1 steps_left: 49969
training epoch: 1 steps_left: 49967
training epoch: 1 steps_left: 49965
training epoch: 1 steps_left: 49963
training epoch: 1 steps_left: 49961
training epoch: 1 steps_left: 49959
training epoch: 1 steps_left: 49957
training epoch: 1 steps_left: 49955
training epoch: 1 steps_left: 49953
training epoch: 1 steps_left: 49951
training epoch: 1 steps_left: 49949
training epoch: 1 steps_left: 49947
training epoch: 1 steps_left: 49945
training epoch: 1 steps_left: 49943
training epoch: 1 steps_left: 49941
training epoch: 1 steps_left: 49939