psychoinformatics-de / remodnav

Robust Eye Movement Detection for Natural Viewing
Other
59 stars 16 forks source link

RF: Adjust code and test to work with the re-released Anderson data #3

Closed mih closed 5 years ago

mih commented 5 years ago

This is the successor of https://github.com/psychoinformatics-de/remodnav/pull/2 after the re-release of the Anderson et al data.

Some file pairs in the new release have inconsistent length

  UL27_trial17_labelled_{}.mat [454, 455]
  TH34_img_vy_labelled_{}.mat [4990, 4988]
  TH38_video_dolphin_fov_labelled_{}.mat [4047, 4044]
  UL23_video_triple_jump_labelled_{}.mat [2823, 2821]
  UL27_video_triple_jump_labelled_{}.mat [2822, 2824]

The difference (in brackets) is less than 3 samples so it should not make much of a difference. For the stats below I truncate the longer one to the size of the shorter one.

Duration stats

No change in the discrepancy pattern from the previous data. Substantial deviations are in BOLD. Values reported in Anderson et al are in parenthesis. Reported values in this section are not based on a re-annotation, but on the annotations released by Anderson et al. There should be no differences.

One possible conclusion for the observed differences (that are limited to a small subset of stats, while the code to compute them is identical) are differences in the released data vs the originally used data. Another explanation could be a simple copy editing error in the table that was not caught.

Fixation durations

Coder IMG-Mean IMG-SD IMG-No VID-Mean VID-SD VID-No
MN 251 (248) 285 (271) 403 (380) 304 (318) 276 (289) 82 (67)
RA 247 (242) 287 (273) 391 (369) 232 (240) 177 (189) 81 (67)

Saccade durations

Coder IMG-Mean IMG-SD IMG-No VID-Mean VID-SD VID-No
MN 29 (30) 16 (17) 377 (376) 25 (26) 12 (13) 117 (116)
RA 30 (31) 15 (15) 374 (372) 25 (25) 12 (12) 127 (126)

PSO durations

Coder IMG-Mean IMG-SD IMG-No VID-Mean VID-SD VID-No
MN 21 (21) 10 (11) 313 (312) 20 (20) 11 (11) 97 (97)
RA 21 (21) 9 (9) 310 (309) 17 (17) 7 (8) 89 (89)

Pursuit durations

Coder IMG-Mean IMG-SD IMG-No VID-Mean VID-SD VID-No
MN 363 (363) 152 (187) 3 (3) 527 (521) 343 (347) 51 (50)
RA 298 (305) 174 (184) 17 (16) 481 (472) 317 (319) 70 (68)

Mis-classification summary stats

For all pairwise comparisons, this shows the overall misclassification rate (using timepoints as unit of measure, and limited to timepoints that have been labeled with FIX, SAC, PSO, or PUR by any method, hence ignoring NaN/blinks and undefined (which is rarely used)), same misclassification rate as before, but ignoring PUR events too. The remaining numbers are percentages of labels used in misclassified samples. In contrast to the paper the method label that is misclassifying is given (not "over" and "under", as I found this confusing).

Comparison of human coders comes out very similar to stats reported in Anderson et al. I conclude that the implementation of the evaluation is matching what has been done by them.

images

Analog to table 8 in the paper

img

Comparison MCLF MCLFw/oP Method Fix Sacc PSO SP
MN v RA 6.2 3.1 MN 68 11 21 0
-- -- -- RA 15 14 20 52
MN v ALGO 33.3 11.2 MN 88 1 10 1
-- -- -- ALGO 2 16 8 74
RA v ALGO 33.6 10.4 RA 81 2 9 8
-- -- -- ALGO 7 16 8 69

dots

Analog to table 9 in the paper

Comparison MCLF MCLFw/oP Method Fix Sacc PSO SP
MN v RA 10.7 4.2 MN 11 10 9 71
-- -- -- RA 64 7 6 23
MN v ALGO 24.2 9.8 MN 12 1 6 80
-- -- -- ALGO 72 8 6 14
RA v ALGO 26.8 10.5 RA 26 2 4 68
-- -- -- ALGO 59 10 5 26

videos

Analog to table 10 in the paper

Comparison MCLF MCLFw/oP Method Fix Sacc PSO SP
MN v RA 18.5 4.0 MN 75 3 8 15
-- -- -- RA 16 4 3 77
MN v ALGO 38.1 10.6 MN 37 1 5 58
-- -- -- ALGO 54 9 6 31
RA v ALGO 38.6 11.7 RA 22 1 4 73
-- -- -- ALGO 66 10 7 17
codecov-io commented 5 years ago

Codecov Report

Merging #3 into master will not change coverage. The diff coverage is n/a.

Impacted file tree graph

@@           Coverage Diff           @@
##           master       #3   +/-   ##
=======================================
  Coverage   88.92%   88.92%           
=======================================
  Files           8        8           
  Lines         677      677           
=======================================
  Hits          602      602           
  Misses         75       75
Impacted Files Coverage Δ
remodnav/tests/test_labeled.py 100% <ø> (ø) :arrow_up:

Continue to review full report at Codecov.

Legend - Click here to learn more Δ = absolute <relative> (impact), ø = not affected, ? = missing data Powered by Codecov. Last update 5690254...f11028f. Read the comment docs.

adswa commented 5 years ago

The difference (in brackets) is less than 3 samples so it should not make much of a difference. For the stats below I truncate the longer one to the size of the shorter one.

looks/sounds good!

One possible conclusion for the observed differences (that are limited to a small subset of stats, while the code to compute them is identical) are differences in the released data vs the originally used data. Another explanation could be a simple copy editing error in the table that was not caught.

sounds like two likely explanations