Investigate invalid use of ground truth which leads to improved performances

spinalcordtoolbox / disc-labeling-hourglass

Labeling of intervertebral discs using the Hourglass deep learning architecture.

GNU Lesser General Public License v2.1

1 stars 0 forks source link

Investigate invalid use of ground truth which leads to improved performances #23

Open NathanMolinier opened 1 year ago

NathanMolinier commented 1 year ago

Description

Recently, I noticed that the post processing function applied during testing: extract_skeleton, used some information from the labels to know how many intervertebral discs were present in the image. Therefore, by doing this the performances of the hourglass are biased and less subject to false detection.

INVALID step: Images with a discs number higher than the number of discs the hourglass was trained to find were also removed from the testing and the training.

Conclusion

Further investigation need to be done in the function extract_skeleton and in create_skeleton to see how this post processing function could be improved/fixed to increase fairly the hourglass performances.

NathanMolinier commented 1 year ago

By removing this ground truth use, I noticed that some issues and false detections started to appear. Indeed, one problem is that some classes corresponding to unseen discs during training are detected which is causing errors in the skeleton reconstruction. Here is an example of a mask returned by the hourglass network corresponding to the disc 15 which is not present in the training set.

To further investigate these random generated masks, I will investigate the input training masks.

NathanMolinier commented 1 year ago

The best approach to avoid such false detections seems to be to reduce the number of class. Indeed, if we look at the spinegeneric dataset,

For T1w images: T1w_distribution

For T2w images: T2w_distribution

jcohenadad commented 1 year ago

But the spine-generic dataset should not be used as an absolute representation of scans around the world. Some hospitals acquire only 4-5 vertebrae, others acquire 10-15. There is no general rule about the number of discs to expect in one MRI scan unfortunately.

NathanMolinier commented 1 year ago

Yes I agree, but I still need to choose a relevant number of class for the training to limit false detections caused by under-represented discs.

Moreover, I might need to rewrite some functions to handle images with a different FOV.

NathanMolinier commented 1 year ago

The hourglass is performing poorly on new datasets when trained only on the spinegeneric dataset. Indeed, for most of the discs, several predictions are created. Further investigation need to be done regarding input parameters (resolution, FOV...) to detect potential differences in the input data.

Other solution: Retrain the hourglass network with more data

Number of predictions for each of the 11 classes (discs) followed by the number of combinations possible.

NathanMolinier commented 1 year ago

I just noticed that the loss was only computed on non-empty GT masks resulting to a lot of false positive prediction.

https://github.com/spinalcordtoolbox/disc-labeling-hourglass/blob/1c8ff2893f85883ac94c21ec72257f0af4194009/src/dlh/models/jointsmseloss.py#L31-L37

This feature should be removed, however, for now I will just set the variable use_target_weight to False.

https://github.com/spinalcordtoolbox/disc-labeling-hourglass/blob/1c8ff2893f85883ac94c21ec72257f0af4194009/src/dlh/models/jointsmseloss.py#L15-L19