Open Rubikplayer opened 6 years ago
@Rubikplayer
For the updated error_log, it mentions: pygpu.gpuarray.GpuArrayException: cuMemAlloc: CUDA_ERROR_OUT_OF_MEMORY: out of memory
. Can you change the cnmem=0.75
--> cnmem=0.95
in .theanorc OR change __GPUMemoryGB = 11
to a safe value, say __GPUMemoryGB = 6
in params.py
and let's see what it print out.
Also, for the theano installation please refer to https://github.com/mjiUST/SurfaceNet/issues/3#issuecomment-371688429
@mjiUST
The code seems to be running, after I set gpuarray.preallocate=0.8
(also commented #cnmem=0.75
). (This was before I saw your feedback. I will try your suggested values a bit later).
May I confirm with you on two questions:
gpuarray.preallocate
and cnmem
. According to the theano doc link, seems gpuarray.preallocate
was designed for new gpu back, and cnmem
for the old one. Since we are using version 0.9, I suppose I should set cnmem
instead of gpuarray.preallocate
? If so, then what I just set was just not setting any limit.
My setting change: __GPUMemoryGB = 11
and __cube_D = 32
.
Also, my GPU (1080 Ti) should be slower than Titan X.
Thanks for the help!!
@Rubikplayer Thanks for your feedback. It's great to know the code is running.
For the theano memory preallocation, the link you mentioned says that after you set the Theano flag allow_gc
to False
(Theano will not collect GPU memory garbage.), CNMeM
will not affect GPU speed anymore. In my opinion, CNMeM
and gpuarray.preallocate
are the same thing for older and newer versions. Just use any one which let the GPU memory preallocated in the very beginning (you can use command watch nvidia-smi
to check, i.e., the majority memory was reserved.)
For the speed of SurfaceNet: the setting __cube_D = 64
could result in a little bit faster process. Before that you can check whether your .theanorc include optimizer=fast_run
for fast running mode as mentioned in https://github.com/mjiUST/SurfaceNet/blob/149f6e05c084ee4e757b5bd9b8efef8f46b78ffb/installEnv.sh#L40
If everything goes well, the dinosaur dataset should finish in one hour.
@mjiUST
Thanks for the suggestion! I tried optimizer=fast_run
indeed accelerates the process. but for __cube_D = 64
, I still got some out of memory issue. I've sent an email to your school email for detail questions.
Thanks again!
Hi thanks for previous feedback in another thread. After I setup up Cuda8.0/CuDNN 5.1 and theano 0.9, I can run some part of
main.py
. But there's still some error when executingpatch2embedding()
function in the early rejection stage.More specifically:
Detail error log can be seen here: err_log.txt
I have tried:
theano-cache purge
orrm -rf ./.theano
None has worked so far.
Have you seen this type of error before? Or did I set my computer correctly? I observed you have a
params.py
to specify all parameters. Some has mentioned this error can result from lack of memory (link), and it seems your code did something for batch processing.Info of my setting:
My
~/.theanorc
:If you have any suggestions, please let me know! Thanks for your help and support!
Update:
After I tried to remove other versions of CuDNN: (https://groups.google.com/forum/#!topic/theano-users/w4M3Xy0ec60), the error changes to the following.