marcellacornia / sam

Predicting Human Eye Fixations via an LSTM-based Saliency Attentive Model. IEEE Transactions on Image Processing (2018)
https://ieeexplore.ieee.org/document/8400593
MIT License
205 stars 76 forks source link

Not getting good results using pretrained model #3

Closed karandwivedi42 closed 6 years ago

karandwivedi42 commented 6 years ago

Thanks for open sourcing the code :)

I ran the pre-trained resnet-SAM model on the complete CAT2000 trainSet and caculated the mean CC. For me it is coming to be 0.65 which is much less than the test CC mentoined on MIT Saliency page (0.89).

I ran the model exactly as described in the README and am using this repo for CC metrics.

Is there something I am missing?

marcellacornia commented 6 years ago

Hi @karandwivedi42, thanks for downloading our code.

For the CAT2000 dataset, we fine-tuned our model, pre-trained on the SALICON, on the CAT2000 training set. We randomly selected 200 images as validation set (10 for each category) and we resized all images to 180 x 320.

For the evaluation, I suggest you to use the matlab code published by the MIT Saliency Benchmark organizers.

karandwivedi42 commented 6 years ago

Thanks a lot!

Also, I would be really grateful if you could tell what these 3 parameters should be for CAT2000 images.

marcellacornia commented 6 years ago

The other parameters are as follow: shape_r = 180 shape_c = 320 shape_r_gt = 23 shape_c_gt = 40 shape_r_fix = 360 shape_c_fix = 640 upsampling_factor = 16

As you can see, due to memory occupation, we decided to not use the originally image size of the CAT2000 dataset. In fact, images from this dataset have all the same size of 1080 x 1920, which is a little too big. However, we brought our predictions to this size before the evaluation.

Additionally, we use a Lambda layer before returning the output of the model (after the upsampling): outs_up = Lambda(prepare_output, prepare_output_shape)(outs_up) where the two functions are as follow: def prepare_output(x): return x[:, :, :shape_r_fix, :] def prepare_output_shape(s): return s[:2] + (shape_r_fix, shape_c_fix)

Hope it works now!

karandwivedi42 commented 6 years ago

Thanks a lot :)