What threshold to use on model prediction?

jcohenadad commented 6 months ago

The model generates predictions that are too soft. Some voxels are non-zero, even >5 voxels away from the cord. This clearly does not reflect partial volume, but is likely the result of excessive softness of our ground truth.

Some reports: https://docs.google.com/presentation/d/1cmYdhSQieDN7c2QTx6suroGeL7P-3zmkF_8N2KVX1q0/edit#slide=id.g2b082a87bce_9_7

naga-karthik commented 4 months ago

Since the issue is a bit old now, here's an update to summarize important updates:

As suggested in the referenced issue above, we are moving forward with the soft_bin model i.e. the SoftSeg model trained on binarized soft segmentations. The original soft segmentations are binarized offline before training the model (hence, the name soft_bin). NOTE that this is different from the original model which was trained on soft GTs directly.
Even after training the soft_bin model, the predicted outputs contain very small values (i.e. 1e-5, 1e-6, ... etc.) outside the SC. This is expected as we always need to threshold/binarize the model outputs and can't directly use the raw outputs.

I ran this script to understand the effect of different thresholds on the CSA using the soft_bin model and shown below is the STD of CSA across contrasts for each threshold. Each scatter point in the violin plot is one (test) subject’s average CSA across contrasts.

STD CSA across thresholds

![std_csa_threshold](https://github.com/sct-pipeline/contrast-agnostic-softseg-spinalcord/assets/53445351/81ef5570-e294-4892-bb61-346972ab064b)

Based on the plot, it seems that the STD is similar across thresholds when averaged across contrasts. Therefore, we can go with threshold=0.1 as it has the smallest mean STD (in text on top of violin) compared to other thresholds.

Important clarification: The threshold chosen here will only be used during inference (i.e. when the user requires soft prediction as the output). For future versions of the contrast-agnostic model, the input segmentations are binarized (0.5) soft segmentations, hence avoiding the need to worry about thresholds for training.

naga-karthik commented 4 months ago

closing as the following has been decided:

If predictions are to be used for training the next version of the model --> binarize outputs with threshold=0.5 (because we want to input binarized input to the model
If predictions are to be used just as (soft) outputs --> use threshold=0.1 (so that values < 0.1 are 0 and the rest are kept the same) but do not binarize (i.e. create 0/1 array)

sct-pipeline / contrast-agnostic-softseg-spinalcord

What threshold to use on model prediction? #98