Open joshuacwnewton opened 1 year ago
The test uses dummy data, rather than actual spinal cord landmarks.
Looking at the generated dummy data in FSLeyes shows:
So, the registration barely has any effect at all on the dummy data images used in this test.
(In fact, checking the DICE score of the unregistered images gives a value of 0.3333
--> the DICE score only increases to 0.397059
because one of the non-matching points shrinks in size!)
I notice that the test description explicitly mentions BSpline
:
So perhaps in the past, the test made use of these now-commented-out lines:
But ever since they were commented out, the test ceased to be useful?
For context, this test in its current form was added in https://github.com/spinalcordtoolbox/spinalcordtoolbox/commit/4e4951d36dab6ae70a2c2389130e96df701c27f4, with a commit description of
TEST: test_ants: re-added old ANTs test to check compatibility
Since the test was "re-added", my hypothesis is that maybe the lines were commented out from an even older version of the test (e.g. due to a command or two not working, since SCT doesn't bundle the isct_ANTSLandmarksBSplineTransform
binary anymore).
Perhaps the intent was even to fixup the test at a later date?
In the interim, though, with poorer results, perhaps the 0.93
was fudged to a 0.39
to make the test pass... And this faulty test has been present in SCT... for the past 6 years... Woof. :sweat:
sorry for this mess, which i am probably responsible for 😬. As you pointed out, it is not unlikely that this test is now unnecessary given that 'sct_ANTSUseLandmarkImagesToGetBSplineDisplacementField' is no more bundled in SCT, and that we created specific code for landmark-based registration. Although I think the code we created was only for linear transformation (bspline-based landmark transformation might still be relevant...?). I'd need more time to dig, but I could take the time if you'd like me to. Maybe during an SCT meeting.
I'd need more time to dig, but I could take the time if you'd like me to. Maybe during an SCT meeting.
Not a worry! On reflection, I don't think digging further into the history is very necessary or high-priority, because I think it's possible to draw conclusions about this test based on its current state alone (i.e. I think the test is no longer useful).
Specifically, I think that rather than trying to rehabilitate an older, outdated test, I think it might be more helpful to focus energy on expanding our existing suite of registration tests.
(In other words, possibly removing this test, then replacing the sct_check_dependencies
call with one to, e.g., test_sct_register_to_template_dice_coefficient_against_groundtruth
, which uses antsRegistration
internally.)
In the
isct_test_ants
script, we test the external ANTs binaryantsRegistration
against a hardcoded DICE score:https://github.com/spinalcordtoolbox/spinalcordtoolbox/blob/403e04e34a128e4edd968e7ef685747f7a08475e/testing/dependencies/test_ants.py#L26
But, @mguaypaq pointed out in https://github.com/spinalcordtoolbox/spinalcordtoolbox/pull/3915#discussion_r1006114514:
And, running the test, the actual DICE score is nowhere near the reported ideal of
0.931034
:This test has been like this since its inception, all the way back in 2016 (https://github.com/spinalcordtoolbox/spinalcordtoolbox/commit/4e4951d36dab6ae70a2c2389130e96df701c27f4), so. Hm!