Closed mih closed 5 years ago
Merging #2 into master will increase coverage by
0.59%
. The diff coverage isn/a
.
@@ Coverage Diff @@
## master #2 +/- ##
==========================================
+ Coverage 88.92% 89.51% +0.59%
==========================================
Files 8 8
Lines 677 677
==========================================
+ Hits 602 606 +4
+ Misses 75 71 -4
Impacted Files | Coverage Δ | |
---|---|---|
remodnav/tests/test_detect.py | 98.57% <0%> (+5.71%) |
:arrow_up: |
Continue to review full report at Codecov.
Legend - Click here to learn more
Δ = absolute <relative> (impact)
,ø = not affected
,? = missing data
Powered by Codecov. Last update 64e2534...944bfb1. Read the comment docs.
These results look neat. Thanks a lot for all the work!
I have not succeeded in finding an definite explanation for the weird occasional differences in fixation and pursuit duration. However, their paper was published online/accepted in 2016 -- the main analysis code was published on Github in Aug 2017 with the commit message 'Added code for processing data extracted from algorithms, including latest(?) version of extracted data. More code to follow.' Maybe these 'latest(?)' versions of extracted data bear the differences between the results reported in the paper and their more recent code. It at least suggests that anything more recent than something previous has been shared a year after the paper was accepted.
And to quote their readme:
"Some values in the scripts may have been interactively changed during the analysis, so it should not be a interpreted as run-once-for-complete-results code."
as well as
"Some of the matlab code used for the publication"
with an emphasis on some -I wasn't able to find some functions anywhere, e.g. their function simpleAgreement (in mainDetection, used for calculation of proportion of correct classifications), so in my opinion it seems impossible or at least unfeasible to figure out the exact way of their computations.
The way you computed the results looks perfectly reasonable to me regardless of whether this may or may not be a 100% replication of their analysis. Its cool that the performance without pursuits exceeds their stats. For the results with added pursuits the results are still showing good performance of remodnav, and its understandable that adding a classification label to the confusion matrix is more likely to decrease overall classification accuracy compared to the papers "only Top 3 events" approach.
@AdinaWagner Thx!
I have updated the top comment with plots that also compare MN to RA to give us a baseline reference. I also added the analog of tables 8-10 to the top comment. Still looking good.
OK, the test passed -- I will merge this now to give us a starting point for the final stretch.
Richard Anderson presumably uploaded all of the files, I will try to rerun the script locally to see whether the results finally reproduce the article's if I include the file previously missing
Maybe I'm missing something here, or maybe I don't fully understand the part of the script with the confusion matrices.
The most interesting questions answered first: The missing file seemed to be a file called UH33_trial17_labelled_MN.mat
. The new data directory is luckily more organized than before, and only paired datafiles are present.
I tried to feed the new data directory into @mih's eval/anderson.py script. This works well for print_duration stats and yields - unsurprisingly - the same results as before in the first summary tables of this conversation (here is a terminal output, for anyone interested to check)
images MN FIX: 0.252 (0.285) [403] SAC: 0.029 (0.017) [377] PSO: 0.021 (0.011) [313] PURS: 0.363 (0.153) [3] images RA FIX: 0.247 (0.288) [391] SAC: 0.031 (0.015) [374] PSO: 0.021 (0.009) [310] PURS: 0.299 (0.175) [17] dots MN FIX: 0.191 (0.088) [12] SAC: 0.023 (0.010) [47] PSO: 0.015 (0.005) [33] PURS: 0.363 (0.233) [48] dots RA FIX: 0.168 (0.090) [21] SAC: 0.022 (0.011) [47] PSO: 0.015 (0.008) [28] PURS: 0.367 (0.329) [45] videos MN FIX: 0.304 (0.277) [82] SAC: 0.026 (0.013) [117] PSO: 0.020 (0.011) [97] PURS: 0.528 (0.344) [51] videos RA FIX: 0.232 (0.177) [81] SAC: 0.025 (0.012) [127] PSO: 0.017 (0.008) [89] PURS: 0.481 (0.317) [70]
However, the confusion() function fails for comparisons of the two human coders (i.e. when running confusion('RA', 'MN')
, not when running the comparison with the 'ALGO' option )
The error is due to a mismatch in shape, for example:
<ipython-input-99-317f50ae23b8> in confusion(refcoder, coder)
70 intersec = np.sum(np.logical_and(
71 labels[0] == anderson_remap[c1label],
---> 72 labels[1] == anderson_remap[c2label]))
73 union = np.sum(np.logical_or(
74 labels[0] == anderson_remap[c1label],
ValueError: operands could not be broadcast together with shapes (4990,) (4988,)
I've tried this for a bunch of input files, and the funny thing is, this error emerges for couple of files but not for all.
I ran a diff -r
between the old and the new data directory, and the error emerges consistently and only for these files where diff -r
indicates that the content of the files has changed from the old directory to the new directory:
╭─adina@odin ~/Repos/remodnav/remodnav/tests/data on new_anderson+!
╰─➤ diff -r anderson_etal/annotated_data/complete_data/images anderson_etal_old/annotated_data/images 2 ↵
Binary files anderson_etal/annotated_data/complete_data/images/TH34_img_vy_labelled_MN.mat and anderson_etal_old/annotated_data/images/TH34_img_vy_labelled_MN.mat differ
Only in anderson_etal_old/annotated_data/images: TH38_img_Europe_labelled_RA.mat
Only in anderson_etal_old/annotated_data/images: TH46_img_Rome_labelled_RA.mat
Only in anderson_etal_old/annotated_data/images: TH50_img_vy_labelled_RA.mat
Only in anderson_etal_old/annotated_data/images: TL44_img_konijntjes_labelled_RA.mat
Only in anderson_etal_old/annotated_data/images: TL48_img_Europe_labelled_RA.mat
Only in anderson_etal_old/annotated_data/images: TL48_img_Rome_labelled_RA.mat
╭─adina@odin ~/Repos/remodnav/remodnav/tests/data on new_anderson+!
╰─➤ diff -r anderson_etal/annotated_data/complete_data/videos anderson_etal_old/annotated_data/videos 1 ↵
Binary files anderson_etal/annotated_data/complete_data/videos/TH38_video_dolphin_fov_labelled_MN.mat and anderson_etal_old/annotated_data/videos/TH38_video_dolphin_fov_labelled_MN.mat differ
Only in anderson_etal_old/annotated_data/videos: TH46_video_BergoDalbana_labelled_RA.mat
Only in anderson_etal_old/annotated_data/videos: TH46_video_BiljardKlipp_labelled_RA.mat
Only in anderson_etal_old/annotated_data/videos: TH50_video_TrafikEhuset_labelled_RA.mat
Only in anderson_etal_old/annotated_data/videos: TL32_video_triple_jump_labelled_RA.mat
Only in anderson_etal_old/annotated_data/videos: TL40_video_BiljardKlipp_labelled_RA.mat
Only in anderson_etal_old/annotated_data/videos: TL44_video_triple_jump_labelled_RA.mat
Only in anderson_etal_old/annotated_data/videos: TL48_video_TrafikEhuset_labelled_RA.mat
Only in anderson_etal_old/annotated_data/videos: UH27_video_TrafikEhuset_labelled_RA.mat
Binary files anderson_etal/annotated_data/complete_data/videos/UL23_video_triple_jump_labelled_MN.mat and anderson_etal_old/annotated_data/videos/UL23_video_triple_jump_labelled_MN.mat differ
Binary files anderson_etal/annotated_data/complete_data/videos/UL23_video_triple_jump_labelled_RA.mat and anderson_etal_old/annotated_data/videos/UL23_video_triple_jump_labelled_RA.mat differ
Binary files anderson_etal/annotated_data/complete_data/videos/UL27_video_triple_jump_labelled_MN.mat and anderson_etal_old/annotated_data/videos/UL27_video_triple_jump_labelled_MN.mat differ
Binary files anderson_etal/annotated_data/complete_data/videos/UL27_video_triple_jump_labelled_RA.mat and anderson_etal_old/annotated_data/videos/UL27_video_triple_jump_labelled_RA.mat differ
Binary files anderson_etal/annotated_data/complete_data/videos/UL31_video_triple_jump_labelled_MN.mat and anderson_etal_old/annotated_data/videos/UL31_video_triple_jump_labelled_MN.mat differ
Binary files anderson_etal/annotated_data/complete_data/videos/UL31_video_triple_jump_labelled_RA.mat and anderson_etal_old/annotated_data/videos/UL31_video_triple_jump_labelled_RA.mat differ
Only in anderson_etal_old/annotated_data/videos: UL43_video_TrafikEhuset_labelled_RA.mat
Only in anderson_etal_old/annotated_data/videos: UL47_video_BiljardKlipp_labelled_RA.mat
╭─adina@odin ~/Repos/remodnav/remodnav/tests/data on new_anderson+!
╰─➤ diff -r anderson_etal/annotated_data/complete_data/dots anderson_etal_old/annotated_data/dots 1 ↵
Only in anderson_etal_old/annotated_data/dots: TH34_trial17_labelled_RA.mat
Only in anderson_etal_old/annotated_data/dots: TH36_trial17_labelled_RA.mat
Only in anderson_etal_old/annotated_data/dots: TH38_trial17_labelled_RA.mat
Only in anderson_etal_old/annotated_data/dots: TH50_trial1_labelled_RA.mat
Only in anderson_etal_old/annotated_data/dots: TL24_trial1_labelled_RA.mat
Only in anderson_etal_old/annotated_data/dots: TL32_trial17_labelled_RA.mat
Only in anderson_etal_old/annotated_data/dots: TL32_trial1_labelled_RA.mat
Only in anderson_etal_old/annotated_data/dots: TL44_trial1_labelled_RA.mat
Binary files anderson_etal/annotated_data/complete_data/dots/UH21_trial1_labelled_MN.mat and anderson_etal_old/annotated_data/dots/UH21_trial1_labelled_MN.mat differ
Binary files anderson_etal/annotated_data/complete_data/dots/UH21_trial1_labelled_RA.mat and anderson_etal_old/annotated_data/dots/UH21_trial1_labelled_RA.mat differ
Only in anderson_etal_old/annotated_data/dots: UH31_trial17_labelled_RA.mat
Only in anderson_etal/annotated_data/complete_data/dots: UH33_trial17_labelled_MN.mat
Binary files anderson_etal/annotated_data/complete_data/dots/UH33_trial17_labelled_RA.mat and anderson_etal_old/annotated_data/dots/UH33_trial17_labelled_RA.mat differ
Only in anderson_etal_old/annotated_data/dots: UH33_trial1_labelled_MN.mat
Only in anderson_etal_old/annotated_data/dots: UL25_trial17_labelled_RA.mat
Binary files anderson_etal/annotated_data/complete_data/dots/UL27_trial17_labelled_MN.mat and anderson_etal_old/annotated_data/dots/UL27_trial17_labelled_MN.mat differ
Binary files anderson_etal/annotated_data/complete_data/dots/UL27_trial17_labelled_RA.mat and anderson_etal_old/annotated_data/dots/UL27_trial17_labelled_RA.mat differ
Only in anderson_etal_old/annotated_data/dots: UL29_trial17_labelled_RA.mat
Binary files anderson_etal/annotated_data/complete_data/dots/UL39_trial1_labelled_MN.mat and anderson_etal_old/annotated_data/dots/UL39_trial1_labelled_MN.mat differ
Binary files anderson_etal/annotated_data/complete_data/dots/UL39_trial1_labelled_RA.mat and anderson_etal_old/annotated_data/dots/UL39_trial1_labelled_RA.mat differ
Only in anderson_etal_old/annotated_data/dots: UL47_trial1_labelled_RA.mat
╭─adina@odin ~/Repos/remodnav/remodnav/tests/data on new_anderson+!
If anyone has an idea what I am missing here, I'd be grateful for an enlightenment -- if it is worth pursuing this confusion. I was able to compute the confusion matrices between human and algorithm, and they show negligible differences (in the dot category)
Cheers!
Thx, will push a PR with the needed changes in a few min.
Thx!
ATM we cannot reproduce (reason is subject to research). Here is a summary of the differences: duration stats for fixations, saccades, PSOs and pursuits. The first value is our's, the one in parenthesis is the one reported in the paper. Note, our values are not own detection results, but stats computed from their released data. Subjectively substantial deviations are in BOLD, although there really should be no deviations at all, and the code to compute all stats is the same for all events (see PR).
Conclusion: We get what is in the paper for saccades and PSOs, but something substantially different for number of fixations.
Fixation durations
Saccade durations
PSO durations
Pursuit durations
Confusions
Assuming we make no mistakes extracting the Anderson labels, here is how our algorithm performs re confusions.
MN vs. RA
This gives us the baseline
algorithm vs. coder MN
algorithm vs. coder RA
Mis-classification summary stats
For all pairwise comparisons, this shows the overall misclassification rate (using timepoints as unit of measure, and limited to timepoints that have been labeled with FIX, SAC, PSO, or PUR by any method, hence ignoring NaN/blinks and undefined (which is rarely used)), same misclassification rate as before, but ignoring PUR events too. The remaining numbers are percentages of labels used in die misclassified samples. In contrast to the paper the method label that is misclassifying is given (not "over" and "under", as I found this confusing).
images
Analog to table 8 in the paper
dots
Analog to table 9 in the paper
videos
Analog to table 10 in the paper
Interim conclusion
Performance looks good. Without pursuit this looks better than the stats in the paper (although I am not 100% confident that we compute things the exact same way). Confusion patterns with and without pursuit look sensible.
Critical feedback appreciated!