Verify whether the `isct_test_ants` test is actually useful for testing ANTs

joshuacwnewton commented 1 year ago

_Originally posted by @joshuacwnewton in https://github.com/spinalcordtoolbox/spinalcordtoolbox/pull/3932#discussion_r1014473659_

In the isct_test_ants script, we test the external ANTs binary antsRegistration against a hardcoded DICE score:

https://github.com/spinalcordtoolbox/spinalcordtoolbox/blob/403e04e34a128e4edd968e7ef685747f7a08475e/testing/dependencies/test_ants.py#L26

But, @mguaypaq pointed out in https://github.com/spinalcordtoolbox/spinalcordtoolbox/pull/3915#discussion_r1006114514:

0.39 looks like a typo, given that it's supposed to be a threshold close to 0.931034? I'd be curious to know if this test actually rejects the (unspecified) versions of ANTs that we're trying to guard against.

And, running the test, the actual DICE score is nowhere near the reported ideal of 0.931034:

This test has been like this since its inception, all the way back in 2016 (https://github.com/spinalcordtoolbox/spinalcordtoolbox/commit/4e4951d36dab6ae70a2c2389130e96df701c27f4), so. Hm!

joshuacwnewton commented 1 year ago

The test uses dummy data, rather than actual spinal cord landmarks.

Looking at the generated dummy data in FSLeyes shows:

`data_dest`: Three 3x3x3 clusters arranged vertically, representing straightened spinal cord landmarks https://github.com/spinalcordtoolbox/spinalcordtoolbox/blob/403e04e34a128e4edd968e7ef685747f7a08475e/testing/dependencies/test_ants.py#L45-L49 ![image](https://user-images.githubusercontent.com/16181459/200404882-62b76eb2-d631-4483-aed4-cb1321483576.png) `data_src`: Three 3x3x3 clusters, with two of the clusters offset, representing curved spinal cord landmarks https://github.com/spinalcordtoolbox/spinalcordtoolbox/blob/403e04e34a128e4edd968e7ef685747f7a08475e/testing/dependencies/test_ants.py#L39-L43 ![image](https://user-images.githubusercontent.com/16181459/200410983-ecc90668-e475-4c2b-bf92-5a0474b34dad.png) `data_src_reg`: Almost identical to `data_src`, with the top cluster shrunken by a little bit. ![image](https://user-images.githubusercontent.com/16181459/200411157-dbcecd96-9cda-486f-aeab-d239be02a2cf.png)

So, the registration barely has any effect at all on the dummy data images used in this test.

(In fact, checking the DICE score of the unregistered images gives a value of 0.3333 --> the DICE score only increases to 0.397059 because one of the non-matching points shrinks in size!)

joshuacwnewton commented 1 year ago

I notice that the test description explicitly mentions BSpline:

https://github.com/spinalcordtoolbox/spinalcordtoolbox/blob/403e04e34a128e4edd968e7ef685747f7a08475e/testing/dependencies/test_ants.py#L20-L21

So perhaps in the past, the test made use of these now-commented-out lines:

https://github.com/spinalcordtoolbox/spinalcordtoolbox/blob/403e04e34a128e4edd968e7ef685747f7a08475e/testing/dependencies/test_ants.py#L71-L87

But ever since they were commented out, the test ceased to be useful?

For context, this test in its current form was added in https://github.com/spinalcordtoolbox/spinalcordtoolbox/commit/4e4951d36dab6ae70a2c2389130e96df701c27f4, with a commit description of

TEST: test_ants: re-added old ANTs test to check compatibility

Since the test was "re-added", my hypothesis is that maybe the lines were commented out from an even older version of the test (e.g. due to a command or two not working, since SCT doesn't bundle the isct_ANTSLandmarksBSplineTransform binary anymore).

Perhaps the intent was even to fixup the test at a later date?

https://github.com/spinalcordtoolbox/spinalcordtoolbox/blob/403e04e34a128e4edd968e7ef685747f7a08475e/testing/dependencies/test_ants.py#L59

In the interim, though, with poorer results, perhaps the 0.93 was fudged to a 0.39 to make the test pass... And this faulty test has been present in SCT... for the past 6 years... Woof. :sweat:

jcohenadad commented 1 year ago

sorry for this mess, which i am probably responsible for 😬. As you pointed out, it is not unlikely that this test is now unnecessary given that 'sct_ANTSUseLandmarkImagesToGetBSplineDisplacementField' is no more bundled in SCT, and that we created specific code for landmark-based registration. Although I think the code we created was only for linear transformation (bspline-based landmark transformation might still be relevant...?). I'd need more time to dig, but I could take the time if you'd like me to. Maybe during an SCT meeting.

joshuacwnewton commented 1 year ago

I'd need more time to dig, but I could take the time if you'd like me to. Maybe during an SCT meeting.

Not a worry! On reflection, I don't think digging further into the history is very necessary or high-priority, because I think it's possible to draw conclusions about this test based on its current state alone (i.e. I think the test is no longer useful).

Specifically, I think that rather than trying to rehabilitate an older, outdated test, I think it might be more helpful to focus energy on expanding our existing suite of registration tests.

(In other words, possibly removing this test, then replacing the sct_check_dependencies call with one to, e.g., test_sct_register_to_template_dice_coefficient_against_groundtruth, which uses antsRegistration internally.)

spinalcordtoolbox / spinalcordtoolbox

Verify whether the `isct_test_ants` test is actually useful for testing ANTs #3935