wenguanwang / DHF1K

Revisiting Video Saliency: A Large-scale Benchmark and a New Model (CVPR18, PAMI19)
138 stars 28 forks source link

Regarding saliency metric (especially CC) #14

Closed snlee81 closed 4 years ago

snlee81 commented 4 years ago

Hi,

First of all, thank you for providing such a nice dataset and code for evaluation metrics. I would like to evaluate my saliency results using five metrics that paper talked about.

I found that linear cross correlation can provide positive and negative values [-1,1]. So, when you present score for CC, did you take absolute value of each linear CC with respect to image and averaging it with each frame and each clip? Your MATLAB code doesn't do that, but that does make sense to me.

Look forward to your reply.

wenguanwang commented 4 years ago

Hi, @snlee81 , thanks for your interest.

The functions are borrowed from some popular saliency datasets.

As you can see, the computation of CC is based on the matlab function 'corr2', whose value ranges from -1 to 1.

snlee81 commented 4 years ago

Hi @wenguanwang

Thank you for your answer. I know these metrics are from MIT saliency benchmark. My question is more of how to compute CC for the video.

Basically, each saliency image and groundtruth (fixation followed by Gaussian blur) yield a single CC which is in [-1, 1]. If we assume there is negative CCs for the first half of the image and positive CCs for the next half of the image. Then, if you calculate representative value of one video with simple averaging, it "may" go near zero since positive and negative are mixed.

So my question is how to calculate this one value per video using these generated CCs per image. Similarly, I see that DHF1K benchmark has all positive values for CCs for more than 10 methods: https://mmcheng.net/videosal/ For the 300 test video, how did you get the representative CCs? For example, Is it simple average? Or abs()? Or \sqrt{\sum (CC)^2}? I may miss that part from your paper, but it would be great to know how you did that.

Look forward to your reply.

wenguanwang commented 4 years ago

@snlee81

You can find related code in 'evaluationFunc'.

The cc score is first computed for each frame. Then I directly average the scores over the whole dataset.

For saving time, I randomly sample 50000 frames per dataset : )

snlee81 commented 4 years ago

@wenguanwang

Thank you for your quick answer!

Yes. I saw that one and run by myself with DFH1K dataset. The problem I had was CC scores are averaged and converge to somehow near 0. I investigated more and found that some are negative and some are positive CCs. But you didn't do any of abs() or (.)^2 to make all of them to positive, right?

I didn't randomly sample 50000 frames per dataset, but average all of them.

wenguanwang commented 4 years ago

@snlee81

Yes, I do not adopt any of abs() or (.)^2. The evaluation just follows the standard-setting in the field.

snlee81 commented 4 years ago

@wenguanwang I got it! Thank you for confirming!

wenguanwang commented 4 years ago

@snlee81 BTW, consider an extreme case. If a model returns a saliency prediction map, where the saliency score in each position (\ie, 1-v) is exactly opposite the groundtruth value (\ie, v), a ~-1 CC score will be returned. So there is no need to apply any extra operations, such as abs() or (.)^2, over the CC computation. If you have a good saliency prediction, the CC score should be close 1. When you get a negative score, the prediction and groundtruth have a negative correlation.

wenguanwang commented 4 years ago

@snlee81 And you can try the CC loss in my training code, which helps to improve this score.

snlee81 commented 4 years ago

@wenguanwang

Oh, Indeed. If proposed saliency results are completely inverted from the ground truth, it gives negative correlation = -1 which is worse than zero correlation. I thought negative correlation is better than no correlation.

Also, it is great to check out your training code again and adapted to mine using CC loss. Thank you again!