Thank you all for participating in this year's BraTS Challenge.
This year we are introducing 2 new performance metrics called lesion-wise dice score and lesion-wise Hausdorff distance-95 (HD95). This is mainly developed to understand the performance of a model at a lesion level and not at an image level. By evaluating models lesion-by-lesion we can understand how well models catch and segment abnormalities, and doesn't bias the results in favor of models that capture only large lesions.
Below is an outline of how we perform this analysis -
First, let’s take a pair of MR images that are FLAIR and T1Post Contrast. We assume that we have segmentation model that can segment the lesions for these pair of images. On the right we have the predicted segmentations by the model.
Now, let’s analyze the predicted segmentation against the ground truth further. For simplicity we compare the lesions on the T1Post image. So, while comparing we see that the Predicted segmentation has some differences against the ground truth. The key differences is that -
So, what we really want to emphasize here is that
On the left we have the ground truth and on the right we have the predictions. We perform a connected component analysis on the prediction mask and compare it component by component to the GT mask after combing lesions by dilation. Now we compare each component one-by-one, and we see here that the model has missed 2 lesions below and has produced a false positive. It has produced one lesion that overlaps with the true lesion from the ground truth mask, so we calculate metrics for that.
To ensure we get the right number of GT lesions we first perform a dilation on the GT mask, we combine the components that exists within the region of interest of the Dilated Component. The way we do that is we perform a 26-connectivity 3D connected component on the Dilated Mask as well, and use that to combine lesions within that ROI. So, on the left here we have the 4 components in Enhancing tissue, On the right we have the dilated Ground truth Enhancing tissue. Now, to get the right number of ground truth lesions we combine these two and see if multiple components fall into a single Dilated ROI.
We see here that the 2 components on the upper right side really belong to just one lesion, so we count that as one lesion for our analysis hence yielding a total of 3 lesions instead of 4. in the Enhancing Tissue. We use this as the ground truth for our comparison against the model predictions.
To formalize the mathematical formula. It is basically summation of dice and HD95 divided by sum of the number of TP; FP and FN. Same is done for HD95, Here L is the number of GT lesions that we calculate after dilation.
Each challenge has set a volumetric threshold, below which participants' models won't be evaluated for those "small/false" lesions. This is done mainly so that participants are not penalized for stray voxels in the GT masks which could be caused due to human error, or small lesions that aren’t related to the pathology pertaining to the challenge. This threshold has been decided by Clinical Radiologist after they reviewed all the segmentation datasets. Here we have the table for the Threshold parameters. These radiologists also decide the Dilation factor which in same way for combing lesions in the GT masks.
This project couldn't have been done without the help of
These metrics have been integrated to the Synapse Platform by Verena Chung. Code
Arxiv paper to follow...