marcellacornia / mlnet

A Deep Multi-Level Network for Saliency Prediction. ICPR 2016
MIT License
94 stars 37 forks source link

how much gpu memory is required? #6

Closed xiyue6911 closed 5 years ago

xiyue6911 commented 7 years ago

HI I tried to train the model with a GTX1080 card(8GB memory),but this error occurred: ..... Training ML-Net Epoch 1/20 Traceback (most recent call last): File "main.py", line 53, in ModelCheckpoint('weights.mlnet.{epoch:02d}-{val_loss:.4f}.pkl', save_best_only=True)]) File "/usr/local/lib/python2.7/dist-packages/Keras-1.2.2-py2.7.egg/keras/engine/training.py", line 1557, in fit_generator class_weight=class_weight) File "/usr/local/lib/python2.7/dist-packages/Keras-1.2.2-py2.7.egg/keras/engine/training.py", line 1320, in train_on_batch outputs = self.train_function(ins) File "/usr/local/lib/python2.7/dist-packages/Keras-1.2.2-py2.7.egg/keras/backend/theano_backend.py", line 959, in call return self.function(*inputs) File "/usr/local/lib/python2.7/dist-packages/theano/compile/function_module.py", line 871, in call storage_map=getattr(self.fn, 'storage_map', None)) File "/usr/local/lib/python2.7/dist-packages/theano/gof/link.py", line 314, in raise_with_op reraise(exc_type, exc_value, exc_trace) File "/usr/local/lib/python2.7/dist-packages/theano/compile/function_module.py", line 859, in call outputs = self.fn() MemoryError: Error allocating 393216000 bytes of device memory (CNMEM_STATUS_OUT_OF_MEMORY). Apply node that caused the error: GpuElemwise{add,no_inplace}(GpuElemwise{add,no_inplace}.0, GpuElemwise{Abs,no_inplace}.0) Toposort index: 575 Inputs types: [CudaNdarrayType(float32, 4D), CudaNdarrayType(float32, 4D)] Inputs shapes: [(10, 128, 240, 320), (10, 128, 240, 320)] Inputs strides: [(9830400, 76800, 320, 1), (9830400, 76800, 320, 1)] Inputs values: ['not shown', 'not shown'] Outputs clients: [[GpuElemwise{mul,no_inplace}(CudaNdarrayConstant{[[[[ 0.5]]]]}, GpuElemwise{add,no_inplace}.0)]]

.... I watched the gpu memory-usage increased to 7.8G(the total memory is 8110M) after i started to train,and then this error occurred.The message above shows "Error allocating 393216000 bytes", does this mean 393M more memory is needed? And which card did you guys use?

marcellacornia commented 7 years ago

Hi @xiyue6911, thanks for downloading our code. We trained our network on a Titan X GPU (12GB memory). You can try to perform the training by reducing the input image size or the batch size in the config.py file.

xiyue6911 commented 7 years ago

@marcellacornia Thanks! It's very helpful. I tried to reduce the number of training images from 10000(Salicon datasets) to about 8500,but this out-of-memory problem also happened.

marcellacornia commented 7 years ago

With "input image size" I mean the number of rows and cols of input images. Try with shape_r=240 and shape_c=320.

xiyue6911 commented 7 years ago

@marcellacornia I know. I have changed the batch size to 5,and it seems worked. screenshot from 2017-04-07 01 21 13 Is there any problem? The loss is nan?

xiyue6911 commented 7 years ago
Hi @marcellacornia. To train the model with limited gpu memory, i 'v tried to change the batch size to 5 with original input resolution,and then change the shape of input to [320,240] with batch size=20, the same problem occurred in both situations, i.e., loss  was normal on the first batch (about 0.006)  and turned to be Nan suddenly after that. 

I removed the first batch, the loss was normal at first ,but then turned to be NAN again. I reduced the learning rate to 1e-7, it didn't work. I checked the input data,and found nothing. I use KERAS 1.2.2 and theano 0.8.2. The groundtruth are saliency maps generated from the saliency mat files of Salicon dataset. I 'm really frustrated now. I will be very grateful if u could give me any suggestions.

marcellacornia commented 7 years ago

Hi @xiyue6911, unfortunately I don't know how to help you. Did you check if images and saliency maps are correctly loaded? Our network works with saliency maps normalized between 0 and 1.

milumilule commented 7 years ago

I have read your paper and downloaded your code. Then I changed the dataset from your dataset to MSRA dataset to do some work about saliency detection. But there may be a small error, could you please help me. Thank you very much! image

Alinadi23 commented 6 years ago

Is it possible to run this code on CPU? I have GPU GTX660 and i am not sure about running on this GPU.

immortal3 commented 6 years ago

@Alinadi23 Code can run on CPU but still, you will need enough RAM to perform backpropagation.