jakewilliami / FaceDetection.jl

A face detection algorithm using Viola-Jones' rapid object detection framework written in Julia
MIT License
28 stars 2 forks source link

Fix discrepancy with pythonic results #46

Open jakewilliami opened 4 years ago

jakewilliami commented 4 years ago

There is a discrepancy in results of this algorithm compared to the Pythonic one. Both algorithms work, but produce different results.

jakewilliami commented 4 years ago

After f9b07196, I began to benchmark results of the basic.jl compared to Simon Hohberg's example.py. However, when I started this, I realised — something I had forgotten before now — though both algorithms correctly work, there was a discrepancy in the accuracy of my one compared to Hohberg's.

I spent around 12 hours straight last night, and into the wee hours, looking for the source of this discrepancy.

Upon inspection (results pushed in 3e9be4ad), here is what I found:

The former corrections I made did not change the results much. However, the last correction I am still unsure of.

One way to test that the procedure of algorithms (comparing Python to Julia) are the same is to simply test how many features it finds. Python found 2429 features for the standard test set, but Julia found 4520. Upon further inspection, the way I can fix this discrepancy is to change the x and the y in the inner-most loops to start searching from zero instead of one, and subtracting one from both end points.

However, even when the number of features obtained are the same, the results are different. They are not hugely different — as I say, both algorithms work. But they are different.

As it doesn't make sense for Julia to index from zero, I have kept the inner-most loops in create_features searching from one to the end point (see 269f26e6). As a result, the number of features to search through is greater, and the results are closer to that of the Hohberg's algorithm.

One thing to note, which I found, was that Python reads directories seemingly randomly, where Julia reads directories alphabetically. I am unsure if this explains the persisting discrepancy, but it seems to change the results from obtaining the classification_error vector.

The question now is two-fold:

jakewilliami commented 3 years ago

[ef4015fe] There was another copy error, which changed results:

# Previous results
    Faces:    312/472 (66.10169491525424% of faces were recognised as faces)
 Non-faces 12894/19572 (65.87982832618026% of non-faces were identified as non-faces)

# After fixing copy error
    Faces:    235/472 (49.78813559322034% of faces were recognised as faces)
 Non-faces 15457/19572 (78.97506642141835% of non-faces were identified as non-faces)