Reference scores for the evaluation are following. In the order of voldiff, avgdist, tanerr, truepos, and falsepos, 68.29, 4.85, 75.16, 67.79, 32.20, respectively. The score for voldiff and avgdist is computed as '100 - 10 * measurement / reference', and the score for truepos, and falsepos is computed as 90−15×err−referrrefstd where err is the measurement of an error for each criteria of a submitted segmentation and referr and refstd stands for the average and standard deviation of the measurement of an error among human rater segmentations. The measurement is a computed value from the segmentation and the reference is corresponding reference score.
For lesions, we need to implement the following to compare with previous work, as described here: http://www.ia.unc.edu/MSseg/rules.html