Problem running with GPU?

AjayTalati commented 9 years ago

Hi, I seem to have no problems running the project with CPU. When I make the following changes to try and comiple and run with GPU, I get a run time error.

CMakeLists.txt change

option(CPU_ONLY "Use CPU only for Caffe" OFF)

Change dqn_main.cpp change

DEFINE_bool(gpu, true, "Use GPU to brew Caffe");

Run command

./dqn FLAGS_gpu true

Output is

layers { bottom: "filtered_q_values" bottom: "target" top: "loss" name: "loss" type: EUCLIDEAN_LOSS } state { phase: TRAIN } I0122 00:46:38.028017 13190 layer_factory.hpp:78] Creating layer frames_input_layer I0122 00:46:38.028039 13190 net.cpp:67] Creating Layer frames_input_layer

BLAH BLAH BLAH

I0122 00:46:38.028966 13190 net.cpp:394] conv1_layer <- frames I0122 00:46:38.028975 13190 net.cpp:356] conv1_layer -> conv1 I0122 00:46:38.028986 13190 net.cpp:96] Setting up conv1_layer * Aborted at 1421887598 (unix time) try "date -d @1421887598" if you are using GNU date * PC: @ 0x7f9dd4bdb600 (unknown) * SIGSEGV (@0x1) received by PID 13190 (TID 0x7f9dd517f800) from PID 1; stack trace: * @ 0x7f9dd30bed40 (unknown) @ 0x7f9dd4bdb600 (unknown) @ 0x7f9dd4be2776 caffe::caffe_rng_gaussian<>() @ 0x7f9dd4b9e439 caffe::GaussianFiller<>::Fill() @ 0x7f9dd4bd9d7b caffe::ConvolutionLayer<>::LayerSetUp() @ 0x7f9dd4b59ae9 caffe::Net<>::Init() @ 0x7f9dd4b5b4fe caffe::Net<>::Net() @ 0x7f9dd4b751e0 caffe::Solver<>::InitTrainNet() @ 0x7f9dd4b76496 caffe::Solver<>::Init() @ 0x7f9dd4b765fd caffe::Solver<>::Solver() @ 0x417ea3 caffe::GetSolver<>() @ 0x415caa dqn::DQN::Initialize() @ 0x40748a main @ 0x7f9dd30a9ec5 (unknown) @ 0x409798 (unknown) Segmentation fault (core dumped)

AjayTalati commented 9 years ago

Very sorry - I think its training fine with GPU now - forgot to rebuild all/delete old build directory?

I'm not very good with computers :(

muupan commented 9 years ago

Yes, you need to rebuild it if you change CPU_ONLY option.

./dqn FLAGS_gpu true is not a correct way to change the values of the flags. If you want to tun on the gpu flag, you should just write ./dqn -gpu. If you do so, you don't have to modify dqn_main.cpp.

AjayTalati commented 9 years ago

Thanks alot. Really nice code, very clear and compact :)

I'm writing everything down, so if you want to post it just let me know?

May I ask how often does caffe, save the network, or just give some feedback on the lost function and training of the network? I'm new to caffe, and still working through your code. I guess I need to look into how to parse stuff into caffe as well?

I'm also working with some really good guys on a Python/Theano/RL-glue implementation. Testing, network saving and output/control in general a more formal process, (because of RL-glue), and easier to follow, for NOOBs at least. We have a forum - it would be great if you joined and gave us some direction, if you have the time?

https://groups.google.com/forum/#!forum/deep-q-learning

https://github.com/spragunr/deep_q_rl

Finally I'm also trying to implement a very lightweight version, in Lua/Torch. This would be a bit more closer to Deepminds implementation as they've made their emulator public now,

https://github.com/deepmind/xitari

https://github.com/deepmind/alewrap

Cheers :+1:

muupan commented 9 years ago

Currently the network parameters are saved into a file after every 50000 iterations. You can change the interval by changing snapshot param in dqn_solver.prototxt.

Thank you for letting me know other projects. The forum looks great.

AjayTalati commented 9 years ago

Very warm welcome, great to have you as part of the gang :)

Sorry, yes I see, from http://caffe.berkeleyvision.org/tutorial/solver.html

Snapshotting is configured by:

/ The snapshot interval in iterations. snapshot: 5000 / File path prefix for snapshotting model weights and solver state. / Note: this is relative to the invocation of the caffe utility, not the / solver definition file. snapshot_prefix: "/path/to/model" / Snapshot the diff along with the weights. This can help debugging training / but takes more storage. snapshot_diff: false / A final snapshot is saved at the end of training unless / this flag is set to false. The default is true. snapshot_after_train: true

in the solver definition prototxt.

I like caffe it's really well documented.

A bit heavy going at first, but a good investment!

onlytailei commented 8 years ago

Hi I have another question.

If I set the GPU mode in solver prototxt, Do I need to set the mode of GPU in dqn_mian again?

muupan / dqn-in-the-caffe

Problem running with GPU? #4

Output is