Closed xiyue6911 closed 5 years ago
Hi @xiyue6911, thanks for downloading our code. We trained our network on a Titan X GPU (12GB memory). You can try to perform the training by reducing the input image size or the batch size in the config.py file.
@marcellacornia Thanks! It's very helpful. I tried to reduce the number of training images from 10000(Salicon datasets) to about 8500,but this out-of-memory problem also happened.
With "input image size" I mean the number of rows and cols of input images. Try with shape_r=240
and shape_c=320
.
@marcellacornia I know. I have changed the batch size to 5,and it seems worked. Is there any problem? The loss is nan?
Hi @marcellacornia. To train the model with limited gpu memory, i 'v tried to change the batch size to 5 with original input resolution,and then change the shape of input to [320,240] with batch size=20, the same problem occurred in both situations, i.e., loss was normal on the first batch (about 0.006) and turned to be Nan suddenly after that.
I removed the first batch, the loss was normal at first ,but then turned to be NAN again. I reduced the learning rate to 1e-7, it didn't work. I checked the input data,and found nothing. I use KERAS 1.2.2 and theano 0.8.2. The groundtruth are saliency maps generated from the saliency mat files of Salicon dataset. I 'm really frustrated now. I will be very grateful if u could give me any suggestions.
Hi @xiyue6911, unfortunately I don't know how to help you. Did you check if images and saliency maps are correctly loaded? Our network works with saliency maps normalized between 0 and 1.
I have read your paper and downloaded your code. Then I changed the dataset from your dataset to MSRA dataset to do some work about saliency detection. But there may be a small error, could you please help me. Thank you very much!
Is it possible to run this code on CPU? I have GPU GTX660 and i am not sure about running on this GPU.
@Alinadi23 Code can run on CPU but still, you will need enough RAM to perform backpropagation.
HI I tried to train the model with a GTX1080 card(8GB memory),but this error occurred: ..... Training ML-Net Epoch 1/20 Traceback (most recent call last): File "main.py", line 53, in
ModelCheckpoint('weights.mlnet.{epoch:02d}-{val_loss:.4f}.pkl', save_best_only=True)])
File "/usr/local/lib/python2.7/dist-packages/Keras-1.2.2-py2.7.egg/keras/engine/training.py", line 1557, in fit_generator
class_weight=class_weight)
File "/usr/local/lib/python2.7/dist-packages/Keras-1.2.2-py2.7.egg/keras/engine/training.py", line 1320, in train_on_batch
outputs = self.train_function(ins)
File "/usr/local/lib/python2.7/dist-packages/Keras-1.2.2-py2.7.egg/keras/backend/theano_backend.py", line 959, in call
return self.function(*inputs)
File "/usr/local/lib/python2.7/dist-packages/theano/compile/function_module.py", line 871, in call
storage_map=getattr(self.fn, 'storage_map', None))
File "/usr/local/lib/python2.7/dist-packages/theano/gof/link.py", line 314, in raise_with_op
reraise(exc_type, exc_value, exc_trace)
File "/usr/local/lib/python2.7/dist-packages/theano/compile/function_module.py", line 859, in call
outputs = self.fn()
MemoryError: Error allocating 393216000 bytes of device memory (CNMEM_STATUS_OUT_OF_MEMORY).
Apply node that caused the error: GpuElemwise{add,no_inplace}(GpuElemwise{add,no_inplace}.0, GpuElemwise{Abs,no_inplace}.0)
Toposort index: 575
Inputs types: [CudaNdarrayType(float32, 4D), CudaNdarrayType(float32, 4D)]
Inputs shapes: [(10, 128, 240, 320), (10, 128, 240, 320)]
Inputs strides: [(9830400, 76800, 320, 1), (9830400, 76800, 320, 1)]
Inputs values: ['not shown', 'not shown']
Outputs clients: [[GpuElemwise{mul,no_inplace}(CudaNdarrayConstant{[[[[ 0.5]]]]}, GpuElemwise{add,no_inplace}.0)]]
.... I watched the gpu memory-usage increased to 7.8G(the total memory is 8110M) after i started to train,and then this error occurred.The message above shows "Error allocating 393216000 bytes", does this mean 393M more memory is needed? And which card did you guys use?