Closed rdaniel closed 11 years ago
Yes, that 10G should be made smaller.
Hi again Nitish.
I've tried 8, 4, and 1 G for the main_mem value. They all still die at line 1621. (with gpu_mem=1G) On a minor note, I noticed that the runall_dbn.sh script uses a $cpu_mem variable that is not defined in the script. However, things still die at the same line if I replace that with the main_mem value. Next step is for me to try debugging this and keep an eye on memory use, but if you have other ideas that would be great.
Thanks, Ron
I think I've found it. Didn't take long once I got time to step through the code in the debugger. The extract_rbm_representations.py code has the gpu_mem and cpu_mem parameters, but it also has a memory=10G parameter. That constant 10G value was passed along the WriteRepresentationsToDisk() call. I replaced that with the main_mem parameter and things worked. I'll close this issue now.
Hi Nitish,
Thanks for making deepnet available. I'm looking forward to working with it. I'm able to run most of the examples, but when I try the multimodal_dbn, I get MemoryErrors when extract_rbm_representation.py is being run. I'm running on an AWS Cluster GPU instance which has 6GB on the GPU and 22 GM main memory. I've been progressively shrinking the gpu_mem (from 4 to 2 to 1) and main_mem values (20,18,16,10,8). With them set to 1G and 8G, respectively, the image layer 1 extract completed for the first time, and layer 2 trained, but then the extract on layer 2 failed. The error dump is appended.
Any suggestions on how to fix this? I notice in extract_rbm_representation that there is an additional memory=10G setting - do I need to make that be no larger than the main_mem setting?
Thanks, Ron
Writing to /vol/FlickrPreproc/flickr/dbn_reps/image_rbm2_LAST/train 998Traceback (most recent call last): File "/home/ubuntu/src/deepnet-master/deepnet/extract_rbm_representation.py", line 81, in
main()
File "/home/ubuntu/src/deepnet-master/deepnet/extract_rbm_representation.py", line 76, in main
data_proto=data_proto)
File "/home/ubuntu/src/deepnet-master/deepnet/extract_rbm_representation.py", line 40, in ExtractRepresentations
layernames, output_dir, memory=memory, dataset=dataset, input_recon=True)
File "/home/ubuntu/src/deepnet-master/deepnet/dbm.py", line 360, in WriteRepresentationToDisk
datagetter()
File "/home/ubuntu/src/deepnet-master/deepnet/neuralnet.py", line 370, in GetTrainBatch
self.GetBatch(self.train_data_handler)
File "/home/ubuntu/src/deepnet-master/deepnet/dbm.py", line 229, in GetBatch
super(DBM, self).GetBatch(handler=handler)
File "/home/ubuntu/src/deepnet-master/deepnet/neuralnet.py", line 361, in GetBatch
data_list = handler.Get()
File "/home/ubuntu/src/deepnet-master/deepnet/datahandler.py", line 627, in Get
batch = self.gpu_cache.Get(self.batchsize, get_last_piece=self.get_last_piece)
File "/home/ubuntu/src/deepnet-master/deepnet/datahandler.py", line 396, in Get
self.LoadData()
File "/home/ubuntu/src/deepnet-master/deepnet/datahandler.py", line 332, in LoadData
self.data[i].overwrite(mat)
File "/home/ubuntu/src/deepnet-master/cudamat/cudamat.py", line 161, in overwrite
array = reformat(array)
File "/home/ubuntu/src/deepnet-master/cudamat/cudamat.py", line 1621, in reformat
return np.array(array, dtype=np.float32, order='F')
MemoryError
./runall_dbn.sh: line 71: 3880 Segmentation fault (core dumped) python ${extract_rep} ${model_output_dir}/image_rbm2_LAST trainers/dbn/train_CD_image_layer2.pbtxt image_hidden2 ${data_output_dir}/image_rbm2_LAST ${gpu_mem} ${cpu_mem}