spragunr / deep_q_rl

Theano-based implementation of Deep Q-learning
BSD 3-Clause "New" or "Revised" License
1.08k stars 348 forks source link

Segmentation fault #6

Closed sridharmahadevan closed 9 years ago

sridharmahadevan commented 9 years ago

Hi,

I've been trying to get your code working, and I'm almost there, but I still am getting a seg fault. The system is running, but it's not saving out any results. Here is what I am getting when I run your script. Thanks for your help!

python ale_run.py --exp_pref data | more RL-Glue Version 3.04, Build 909 A.L.E: Arcade Learning Environment (version 0.4.4) [Powered by Stella] Use -help for help screen. Warning: couldn't load settings file: ./stellarc Game console created: ROM file: /home/mahadeva/Documents/code/deep_rl/roms/breakout.bin Cart Name: Breakout - Breakaway IV (1978) (Atari) Cart MD5: f34f08e5eb96e500e851a80be3277a56 Display Format: AUTO-DETECT ==> NTSC ROM Size: 2048 Bankswitch Type: AUTO-DETECT ==> 2K

Running ROM file... Random Seed: Time Game will be controlled through RL-Glue. RL-Glue Python Experiment Codec Version: 2.02 (Build 738) Connecting to 127.0.0.1 on port 4096... Initializing ALE RL-Glue ... Using gpu device 1: GeForce GTX 980 In file included from /usr/include/python2.7/numpy/ndarraytypes.h:1761:0, from /usr/include/python2.7/numpy/ndarrayobject.h:17, from /usr/include/python2.7/numpy/arrayobject.h:4, from /home/mahadeva/.pyxbld/temp.linux-x86_64-2.7/pyrex/shift.c :239: /usr/include/python2.7/numpy/npy_1_7_deprecated_api.h:15:2: warning: #warning "U sing deprecated NumPy API, disable it by " "#defining NPY_NO_DEPRECATEDAPI NPY 1_7_API_VERSION" [-Wcpp]

warning "Using deprecated NumPy API, disable it by " \

^ In file included from /usr/include/python2.7/numpy/ndarrayobject.h:26:0, from /usr/include/python2.7/numpy/arrayobject.h:4, from /home/mahadeva/.pyxbld/temp.linux-x86_64-2.7/pyrex/shift.c :239: /usr/include/python2.7/numpy/multiarray_api.h:1629:1: warning: ‘_import_array’ defined but not used [-Wunused-function] _import_array(void) ^ In file included from /usr/include/python2.7/numpy/ufuncobject.h:327:0, from /home/mahadeva/.pyxbld/temp.linux-x86_64-2.7/pyrex/shift.c --More--Traceback (most recent call last): File "./rl_glue_ale_agent.py", line 430, in main() File "./rl_glue_ale_agent.py", line 426, in main AgentLoader.loadAgent(NeuralAgent()) File "/usr/local/lib/python2.7/dist-packages/rlglue/agent/AgentLoader.py", line 58, in loadAgent client.runAgentEventLoop() File "/usr/local/lib/python2.7/dist-packages/rlglue/agent/ClientAgent.py", line 144, in runAgentEventLoop switchagentState File "/usr/local/lib/python2.7/dist-packages/rlglue/agent/ClientAgent.py", line 137, in Network.kAgentInit: lambda self: self.onAgentInit(), File "/usr/local/lib/python2.7/dist-packages/rlglue/agent/ClientAgent.py", line 43, in onAgentInit self.agent.agent_init(taskSpec) File "./rl_glue_ale_agent.py", line 158, in agent_init self.network = self._init_network() File "./rl_glue_ale_agent.py", line 192, in _init_network approximator='cuda_conv') File "/home/mahadeva/Documents/code/deep_rl/deep_q_rl/cnn_q_learner.py", line 168, in __init target = theano.gradient.consider_constant(target) AttributeError: 'module' object has no attribute 'consider_constant' Segmentation fault (core dumped) :240: /usr/include/python2.7/numpy/__ufunc_api.h:241:1: warning: ‘_import_umath’ defin ed but not used [-Wunused-function] _import_umath(void) ^ RL-Glue Python Agent Codec Version: 2.02 (Build 738) Connecting to 127.0.0.1 on port 4096... Agent Codec Connected (32, 4, 80, 80) (4, 80, 80, 32) (16, 19.0, 19.0, 32) (32, 9.0, 9.0, 32) (32, 32, 9.0, 9.0) (32, 256) (32, 18) training epoch: 1 steps_left: 50000 training epoch: 1 steps_left: 49995 training epoch: 1 steps_left: 49993 training epoch: 1 steps_left: 49991 training epoch: 1 steps_left: 49989 training epoch: 1 steps_left: 49987 training epoch: 1 steps_left: 49985 training epoch: 1 steps_left: 49983 training epoch: 1 steps_left: 49981 training epoch: 1 steps_left: 49979 training epoch: 1 steps_left: 49977 training epoch: 1 steps_left: 49975 training epoch: 1 steps_left: 49973 training epoch: 1 steps_left: 49971 training epoch: 1 steps_left: 49969 training epoch: 1 steps_left: 49967 training epoch: 1 steps_left: 49965 training epoch: 1 steps_left: 49963 training epoch: 1 steps_left: 49961 training epoch: 1 steps_left: 49959 training epoch: 1 steps_left: 49957 training epoch: 1 steps_left: 49955 training epoch: 1 steps_left: 49953 training epoch: 1 steps_left: 49951 training epoch: 1 steps_left: 49949 training epoch: 1 steps_left: 49947 training epoch: 1 steps_left: 49945 training epoch: 1 steps_left: 49943 training epoch: 1 steps_left: 49941 training epoch: 1 steps_left: 49939

spragunr commented 9 years ago

I think this is duplicate of Issue #3. When I get a few minutes I will update the Readme to emphasize that this package requires recent github versions of Theano, Pylearn2 and ALE. I've considered forking those packages to provide a set of known-working dependencies, but I would rather keep up-to-date by fixing issues as they arise.

sridharmahadevan commented 9 years ago

Yes, I just saw that and rebuilt Theano. I've started another run now...looks like it is finally working!

Thanks for your response!

python ale_run.py --exp_pref data RL-Glue Version 3.04, Build 909 RL-Glue is listening for connections on port=4096 A.L.E: Arcade Learning Environment (version 0.4.4) [Powered by Stella] Use -help for help screen. Warning: couldn't load settings file: ./stellarc Game console created: ROM file: /home/mahadeva/Documents/code/deep_rl/roms/breakout.bin Cart Name: Breakout - Breakaway IV (1978) (Atari) Cart MD5: f34f08e5eb96e500e851a80be3277a56 Display Format: AUTO-DETECT ==> NTSC ROM Size: 2048 Bankswitch Type: AUTO-DETECT ==> 2K

Running ROM file... Random Seed: Time Game will be controlled through RL-Glue. RL-Glue Python Experiment Codec Version: 2.02 (Build 738) Connecting to 127.0.0.1 on port 4096... RL-Glue :: Experiment connected. Initializing ALE RL-Glue ... RL-Glue :: Environment connected. Using gpu device 1: GeForce GTX 980 RL-Glue Python Agent Codec Version: 2.02 (Build 738) Connecting to 127.0.0.1 on port 4096... Agent Codec Connected RL-Glue :: Agent connected. (32, 4, 80, 80) (4, 80, 80, 32) (16, 19.0, 19.0, 32) (32, 9.0, 9.0, 32) (32, 32, 9.0, 9.0) (32, 256) (32, 18) /home/mahadeva/Downloads/Theano/theano/gof/cmodule.py:289: RuntimeWarning: numpy.ndarray size changed, may indicate binary incompatibility rval = import(module_name, {}, {}, [module_name])

OPENING data_01-14-15-14_0p0001_0p9/results.csv training epoch: 1 steps_left: 50000 Simulated at a rate of 55.4920442312/s Average loss: 0.100790430055 training epoch: 1 steps_left: 49780 Simulated at a rate of 58.814803058/s Average loss: 0.0572934835375 training epoch: 1 steps_left: 49597 Simulated at a rate of 60.2730953073/s Average loss: 0.0623961077461 training epoch: 1 steps_left: 49271 Simulated at a rate of 53.5896811515/s Average loss: 0.0776437076656 training epoch: 1 steps_left: 49144

On 01/14/2015 10:03 AM, Nathan Sprague wrote:

I think this is duplicate of Issue #3 https://github.com/spragunr/deep_q_rl/issues/3. When I get a few minutes I will update the Readme to emphasize that this package requires recent github versions of Theano, Pylearn2 and ALE. I've considered forking those packages to provide a set of known-working dependencies, but I would rather keep up-to-date by fixing issues as they arise.

— Reply to this email directly or view it on GitHub https://github.com/spragunr/deep_q_rl/issues/6#issuecomment-69928497.

sridharmahadevan commented 9 years ago

I ran into a bit of difficulty doing a plot with the results.csv file. Here is the output. Any suggestions?

Thanks,

ls -l data_01-14-15-14_0p0001_0p9/ total 17168 -rw-rw-r-- 1 mahadeva mahadeva 15754 Jan 14 10:44 learning.csv -rw-rw-r-- 1 mahadeva mahadeva 17558418 Jan 14 10:29 network_file_1.pkl -rw-rw-r-- 1 mahadeva mahadeva 95 Jan 14 10:31 results.csv mahadeva@manifold:~/Documents/code/deep_rl/deep_q_rl$ python plot_results.py data_01-14-15-14_0p0001_0p9/results.csv Traceback (most recent call last): File "plot_results.py", line 22, in plt.plot(results[:, 0], np.convolve(results[:, 3], kernel, mode='same'), '-*') IndexError: too many indices

On 01/14/2015 10:03 AM, Nathan Sprague wrote:

I think this is duplicate of Issue #3 https://github.com/spragunr/deep_q_rl/issues/3. When I get a few minutes I will update the Readme to emphasize that this package requires recent github versions of Theano, Pylearn2 and ALE. I've considered forking those packages to provide a set of known-working dependencies, but I would rather keep up-to-date by fixing issues as they arise.

— Reply to this email directly or view it on GitHub https://github.com/spragunr/deep_q_rl/issues/6#issuecomment-69928497.

spragunr commented 9 years ago

I haven't seen that before. If you post the contents of results.csv (should be at most a few lines long), I can take a look at it.

sridharmahadevan commented 9 years ago

Never mind, it works fine. I think I was trying to plot the results.csv file when it only had two data points. That causes the error I sent you. Now, it has 4 data points and it plots fine (see attached).

On 01/14/2015 11:09 AM, Nathan Sprague wrote:

I haven't seen that before. If you post the contents of results.csv (should be at most a few lines long), I can take a look at it.

— Reply to this email directly or view it on GitHub https://github.com/spragunr/deep_q_rl/issues/6#issuecomment-69940381.

spragunr commented 9 years ago

Great! The README has been updated.

sridharmahadevan commented 9 years ago

Two additional questions: I noticed it is somewhat low on cpu usage. I'm using a beefy 16 processor machine. I've set Theano to use all 16 threads. Anything else I can do to speed it up?

Also, I have two GPUs. Any way to use both? Theano has a flag that tells it which GPU to use. I wonder if it can be made to use both.

....Sent from my iPhone

On Jan 14, 2015, at 12:13 PM, Nathan Sprague notifications@github.com wrote:

Closed #6.

— Reply to this email directly or view it on GitHub.

spragunr commented 9 years ago

I haven't done any benchmarking for quite a while, but my recollection is that most of the bottlneck is on the GPU in Theano, and most of that time is spent computing the convolutions. This is followed by ALE (single thread), and overhead associated with RLGlue. There isn't much for Theano to parallelize that isn't already running on the GPU.

I don't have any experience trying to get multiple GPUs working with Theano. I'm skeptical that there would be a huge speedup. I've run this code on several different GPUs with widely varying numbers of cores. The performance seems to track the frequency of the GPU more than the number of available cores.

You might get some speedup by trying out some of the more recent convolution implementations that are being integrated into Theano -- in particular cuDNN. Let me know if you have any luck.

-Nathan

sridharmahadevan commented 9 years ago

OK, I realized I had not downloaded cuDnn, so it's possible that Theano will go faster with that. I just downloaded and installed it from nVidia.

My latest problem is that I can't get the SDL display to work. I've compiled and installed the SDL libraries, but it doesn't find the header files SDL.h (I've put the necessary include file -I/place/where/SDL/is in the makefile, but nada so far.

On 01/14/2015 01:39 PM, Nathan Sprague wrote:

I haven't done any benchmarking for quite a while, but my recollection is that most of the bottlneck is on the GPU in Theano, and most of that time is spent computing the convolutions. This is followed by ALE (single thread), and overhead associated with RLGlue. There isn't much for Theano to parallelize that isn't already running on the GPU.

I don't have any experience trying to get multiple GPUs working with Theano. I'm skeptical that there would be a huge speedup. I've run this code on several different GPUs with widely varying numbers of cores. The performance seems to track the frequency of the GPU more than the number of available cores.

You might get some speedup by trying out some of the more recent convolution implementations http://deeplearning.net/software/theano/library/tensor/nnet/conv.html that are being integrated into Theano -- in particular cuDNN. Let me know if you have any luck.

-Nathan

— Reply to this email directly or view it on GitHub https://github.com/spragunr/deep_q_rl/issues/6#issuecomment-69967163.

ghost commented 9 years ago

When you make Atari Learning Environment, did you set the flag in makefile.unix ?

If you see page 11 of the manual that is enclosed with the ALE code it says,

To compile with SDL support, you should set USE_SDL=1 in the ALE makefile.

Hope that helps?

When choosing a trainned network to watch you'd probably be best to train for 100 epochs, and then choose the .pkl file which had the highest per episode score. For Breakout, a NN with average score over 60 would be fun to watch.

sridharmahadevan commented 9 years ago

Yes, that's the problem. I can compile Ale with the USE_SDL flag set to 0. If I set this flag to 1, compilation fails because it can't find SDL.h (whose location I have set in the INCLUDES directive in the Makefile).

On Jan 14, 2015, at 7:57 PM, Ajay Talati notifications@github.com wrote:

When you make Arcade Learning Environment, did you set the flag in makefile.unix ?

If you see page 11 of the manual that is enclosed with the ALE code it says,

To compile with SDL support, you should set USE_SDL=1 in the ALE makefile.

Hope that helps?

— Reply to this email directly or view it on GitHub.

sridharmahadevan commented 9 years ago

OK, everything works fine now! I had to install some missing SDL packages, and also add some INCLUDE links.

The program's been running for a day now, and it does not seem to have learned a very good policy for breakout. I'll give it another day and see. The learning definitely needs to be accelerated, perhaps with some parameter tuning.

Thanks for all your help!

On 01/14/2015 07:57 PM, Ajay Talati wrote:

When you make Arcade Learning Environment, did you set the flag in |makefile.unix| ?

If you see page 11 of the manual that is enclosed with the ALE code it says,

|To compile with SDL support, you should set USE_SDL=1 in the ALE makefile.|

Hope that helps?

— Reply to this email directly or view it on GitHub https://github.com/spragunr/deep_q_rl/issues/6#issuecomment-70022932.

ghost commented 9 years ago

In my synaptic package manager, it shows I have the dev libraries installed for

• libsdl2 • libsdl2-gfx • libsdl2-image

Was it those libraries which were missing?

sridharmahadevan commented 9 years ago

Yes, except I installed the SDL versions of those, not SDL2. The code is set up for SDL, otherwise you have to go and change all the includes from "SDL/SDL.h" to "SDL2/SDL.h".

On 01/15/2015 08:10 AM, Ajay Talati wrote:

In my synaptic package manager, it shows I have the dev libraries installed for

• libsdl2 • libsdl2-gfx • libsdl2-image

Was it those libraries which were missing?

— Reply to this email directly or view it on GitHub https://github.com/spragunr/deep_q_rl/issues/6#issuecomment-70083166.

ghost commented 9 years ago

So, just to clear things up, I think/guess you need to have the following installed,

• libsdl1.2-dev • libsdl-gfx1.2-dev • libsdl-image1.2-dev

with those maybe you don't need to change any INCLUDE links? Hopefully that will help someone in the future?

I'd be really interested in your parameter tunings if you did manage to find a set which gave you faster learning.

Thanks, Ajay

alito commented 9 years ago

Discount rate of 0.95 helps quite a bit. (Need a forum or something. An issue tracker on github doesn't seem like the right place for these conversations)

sridharmahadevan commented 9 years ago

I started another run since I now have cuDNN installed. It makes a difference since the game is now being simulated at a rate of around 65 steps/second (which is quite a bit faster than before). I also increased the discount factor to 0.95. Will check on it in the evening to see how it's progressing.

On 01/14/2015 01:39 PM, Nathan Sprague wrote:

I haven't done any benchmarking for quite a while, but my recollection is that most of the bottlneck is on the GPU in Theano, and most of that time is spent computing the convolutions. This is followed by ALE (single thread), and overhead associated with RLGlue. There isn't much for Theano to parallelize that isn't already running on the GPU.

I don't have any experience trying to get multiple GPUs working with Theano. I'm skeptical that there would be a huge speedup. I've run this code on several different GPUs with widely varying numbers of cores. The performance seems to track the frequency of the GPU more than the number of available cores.

You might get some speedup by trying out some of the more recent convolution implementations http://deeplearning.net/software/theano/library/tensor/nnet/conv.html that are being integrated into Theano -- in particular cuDNN. Let me know if you have any luck.

-Nathan

— Reply to this email directly or view it on GitHub https://github.com/spragunr/deep_q_rl/issues/6#issuecomment-69967163.

ghost commented 9 years ago

I setup a forum, its WWW is,

http://deep-learning.boards.net/

If anyone wants to takeover/share the admin, that would be great?

spragunr commented 9 years ago

@AjayTalati Thanks for taking the initiative to set that up. I'd prefer to use a google web forum, if only because that's the mechanism used by other projects related to this one. I've set something up at:

https://groups.google.com/forum/#!forum/deep-q-learning

sridharmahadevan commented 9 years ago

The results that I am getting seem quite a bit worse than the published NIPS paper. Mean Q after 30 episodes on breakout as shown in the paper is around 2. Mine are 10 times lower, around 0.2. Similarly, reward per epoch should be around 50 and stable. Mine is very unstable, hovering between 3 and 10.

Clearly, something is amiss. Anyone get better results than this?

On 01/15/2015 08:50 AM, Alejandro Dubrovsky wrote:

Discount rate of 0.95 helps quite a bit. (Need a forum or something. An issue tracker on github doesn't seem like the right place for these conversations)

— Reply to this email directly or view it on GitHub https://github.com/spragunr/deep_q_rl/issues/6#issuecomment-70087606.

ghost commented 9 years ago

100_breakout

30 epochs is rather too short a run, to say anything conclusive?

If you carry on to 100 epochs, your reward per episode score should get to around 60. Thats the 4th column in the results csv file.

Maybe Nathan can have a look at the headers in the csv file? I think they might be a labelling issue? It's line 200 of rl_glue_ale_agent.py. The 4th column might be relabelled as reward_per_episode instead of the current reward_per_epoch? So I think it should be,

self.results_file.write(\ 'epoch,num_episodes,total_reward,reward_per_episode,mean_q\n')

As for the mean_q value I'm still unsure of how exactly to calculate that? So if you know how to do it, I'd like to understand?

ghost commented 9 years ago

Sridhar, perhaps we should transfer this conversation over to the forum Nathan has setup?

https://groups.google.com/forum/#!forum/deep-q-learning

sridharmahadevan commented 9 years ago

Yes, good idea. I posted my question there. After 80 episodes, my results are quite poor. It has not learned much, if anything.

On 01/15/2015 08:00 PM, Ajay Talati wrote:

Sridhar, perhaps we should transfer this conversation over to the forum Nathan has setup?

https://groups.google.com/forum/#!forum/deep-q-learning https://groups.google.com/forum/#%21forum/deep-q-learning

— Reply to this email directly or view it on GitHub https://github.com/spragunr/deep_q_rl/issues/6#issuecomment-70192143.