mpatacchiola / deepgaze

Computer Vision library for human-computer interaction. It implements Head Pose and Gaze Direction Estimation Using Convolutional Neural Networks, Skin Detection through Backprojection, Motion Detection and Tracking, Saliency Map.
MIT License
1.78k stars 479 forks source link

Difference in original C++ implementation and Python implementation in this repo #57

Closed v18saboo closed 5 years ago

v18saboo commented 6 years ago

Hi, I used the same horse image that the original authors used for their implementation and compared the results for them. Turns out, there seems to be quite a large difference between the resulting saliency maps. Can you please tell me why? Does opencv inherently have some differences between the internal function implementations? I've attached the file that shows the difference. I ran both without any threshold filtering and gaussian deblurring. The saliency map on the left is the output from C++. The one on the right is the output from Python.

screen shot 2018-07-10 at 1 38 26 pm

Any help is appreciated!

mpatacchiola commented 6 years ago

Hi @varunsaboo

Yes there are some differences between the deepgaze python implementation and the original C++ code. However it is not clear from which stage of the pipeline these differences come from.

The authors of the FASA algorithm have also been involved at some point, despite their help it was not possible to clearly understand the origin of the differences. At the current stage I think that it may be due to different normalization operations in functions such as _bilateral_filtering() and _calculate_probability().

If you find any useful information or you can improve the quality of the results, send a push request and it will be integrated in the current code.

v18saboo commented 6 years ago

A weird observation I found was that minMaxLoc() returns different values for minVal and maxVal when executed on C++ and Python. I'm not sure why, but could this be at the root of the problem? For the horse image from your examples, Python returns : 0.0 255.0 95.0 169.0 116.0 180.0 [minL,maxL,minA,maxA,minB,maxB] C++ returns : 0.0 255.0 96.0 166.0 116.0 179.0 [minL,maxL,minA,maxA,minB,maxB]

mpatacchiola commented 6 years ago

That's quite strange indeed, I can exclude a round off error since there is a large gap between the value 169 and 166 in maxA. The difference can be in the way the image is converted to LAB before being passed to the method _calculate_histogram(). In the python code I used the OpenCV in-built functions in this block of code:

if format == 'BGR2LAB':
            image = cv2.cvtColor(image, cv2.COLOR_BGR2LAB)
        elif format == 'BGR2RGB':
            image = cv2.cvtColor(image, cv2.COLOR_BGR2RGB)
        elif format == 'RGB2LAB':
            image = cv2.cvtColor(image, cv2.COLOR_RGB2LAB)
        elif format == 'RGB' or format == 'BGR' or format == 'LAB':
            pass
        else:
            raise ValueError('[DEEPGAZE][SALIENCY-MAP][ERROR] the input format of the image is not supported.')

However, I do not remeber how the conversion is done in the original C++ code. Probably the C++ code use a different way to convert to LAB...

mpatacchiola commented 5 years ago

Closed for inactivity