mih commented 5 years ago

This is the successor of https://github.com/psychoinformatics-de/remodnav/pull/2 after the re-release of the Anderson et al data.

Some file pairs in the new release have inconsistent length

  UL27_trial17_labelled_{}.mat [454, 455]
  TH34_img_vy_labelled_{}.mat [4990, 4988]
  TH38_video_dolphin_fov_labelled_{}.mat [4047, 4044]
  UL23_video_triple_jump_labelled_{}.mat [2823, 2821]
  UL27_video_triple_jump_labelled_{}.mat [2822, 2824]

The difference (in brackets) is less than 3 samples so it should not make much of a difference. For the stats below I truncate the longer one to the size of the shorter one.

Duration stats

No change in the discrepancy pattern from the previous data. Substantial deviations are in BOLD. Values reported in Anderson et al are in parenthesis. Reported values in this section are not based on a re-annotation, but on the annotations released by Anderson et al. There should be no differences.

One possible conclusion for the observed differences (that are limited to a small subset of stats, while the code to compute them is identical) are differences in the released data vs the originally used data. Another explanation could be a simple copy editing error in the table that was not caught.

Fixation durations

Coder	IMG-Mean	IMG-SD	IMG-No	VID-Mean	VID-SD	VID-No
MN	251 (248)	285 (271)	403 (380)	304 (318)	276 (289)	82 (67)
RA	247 (242)	287 (273)	391 (369)	232 (240)	177 (189)	81 (67)

Saccade durations

Coder	IMG-Mean	IMG-SD	IMG-No	VID-Mean	VID-SD	VID-No
MN	29 (30)	16 (17)	377 (376)	25 (26)	12 (13)	117 (116)
RA	30 (31)	15 (15)	374 (372)	25 (25)	12 (12)	127 (126)

PSO durations

Coder	IMG-Mean	IMG-SD	IMG-No	VID-Mean	VID-SD	VID-No
MN	21 (21)	10 (11)	313 (312)	20 (20)	11 (11)	97 (97)
RA	21 (21)	9 (9)	310 (309)	17 (17)	7 (8)	89 (89)

Pursuit durations

Coder	IMG-Mean	IMG-SD	IMG-No	VID-Mean	VID-SD	VID-No
MN	363 (363)	152 (187)	3 (3)	527 (521)	343 (347)	51 (50)
RA	298 (305)	174 (184)	17 (16)	481 (472)	317 (319)	70 (68)

Mis-classification summary stats

For all pairwise comparisons, this shows the overall misclassification rate (using timepoints as unit of measure, and limited to timepoints that have been labeled with FIX, SAC, PSO, or PUR by any method, hence ignoring NaN/blinks and undefined (which is rarely used)), same misclassification rate as before, but ignoring PUR events too. The remaining numbers are percentages of labels used in misclassified samples. In contrast to the paper the method label that is misclassifying is given (not "over" and "under", as I found this confusing).

Comparison of human coders comes out very similar to stats reported in Anderson et al. I conclude that the implementation of the evaluation is matching what has been done by them.

images

Analog to table 8 in the paper

img

Comparison	MCLF	MCLFw/oP	Method	Fix	Sacc	PSO	SP
MN v RA	6.2	3.1	MN	68	11	21	0
--	--	--	RA	15	14	20	52
MN v ALGO	33.3	11.2	MN	88	1	10	1
--	--	--	ALGO	2	16	8	74
RA v ALGO	33.6	10.4	RA	81	2	9	8
--	--	--	ALGO	7	16	8	69

dots

Analog to table 9 in the paper

Comparison	MCLF	MCLFw/oP	Method	Fix	Sacc	PSO	SP
MN v RA	10.7	4.2	MN	11	10	9	71
--	--	--	RA	64	7	6	23
MN v ALGO	24.2	9.8	MN	12	1	6	80
--	--	--	ALGO	72	8	6	14
RA v ALGO	26.8	10.5	RA	26	2	4	68
--	--	--	ALGO	59	10	5	26

videos

Analog to table 10 in the paper

Comparison	MCLF	MCLFw/oP	Method	Fix	Sacc	PSO	SP
MN v RA	18.5	4.0	MN	75	3	8	15
--	--	--	RA	16	4	3	77
MN v ALGO	38.1	10.6	MN	37	1	5	58
--	--	--	ALGO	54	9	6	31
RA v ALGO	38.6	11.7	RA	22	1	4	73
--	--	--	ALGO	66	10	7	17

codecov-io commented 5 years ago

Codecov Report

Merging #3 into master will not change coverage. The diff coverage is n/a.

@@           Coverage Diff           @@
##           master       #3   +/-   ##
=======================================
  Coverage   88.92%   88.92%           
=======================================
  Files           8        8           
  Lines         677      677           
=======================================
  Hits          602      602           
  Misses         75       75

Impacted Files	Coverage Δ
remodnav/tests/test_labeled.py	`100% <ø> (ø)`	:arrow_up:

Continue to review full report at Codecov.

Legend - Click here to learn more Δ = absolute <relative> (impact), ø = not affected, ? = missing data Powered by Codecov. Last update 5690254...f11028f. Read the comment docs.

adswa commented 5 years ago