The implementation difference of VTN in this repo comparing to VTN paper and retraining results report.

yjump commented 4 years ago

Thanks a lot for making your codes and dataset publicly available. And also congratuations to authors becasue recursive cascaded strategy of base networks dose work well for image registration.

Recently, I have downloaded your dataset and tried to retrain the model with tensorflow 1.5 and CUDA 9.1. I found the following questions:

According to this repo, the checkpoint is saved every 6 hours and 99500-epoch cpkt is saved for evaluation, is this setting also adopted to report the final results in your paper (it is not mentioned in the paper if the results listed are top results or others)?

2, The implementation of VTN in this repo is different from the original VTN paper "Unsupervised 3D End-to-End Medical Image Registration with Volume Tweening Network". The architecture shown VTN paper Fig3 and Fig4, do not have conv3_1 (see basenet.py, line 68& line 210) in the affine and deform subnetwork. And also, it is mention by another issue that the decoder uses 5 additional upsampling branches ((see basenet.py, line 95 etc.), this is also not mentioned by VTN. (And it is worth nothing but these upsampling branches do not use LeakyReLU after conv and decon as activation interestingly)

And maybe due to the above reason，when I retrain your "improvemented VTN" on live CT scan, the resulst of 1-cascade VTN is dice: 0.915 and landmark 13.1 by evaluating 99500-epoch ckpt closing to your results in "Recursive Cascaded Networks for Unsupervised Medical Image Registration". But I adjusted to retrain the original VTN (remove conv_3 and 5 upsampling branches) , the result of 1-cascade VTN is dice: 0.909 and landmark 13.2 . A relatively obvious difference existed in dice score results.

This findings do not influence the success of recursive cascaded strategy. Maybe you can address the difference between "improvemented VTN" and "original VTN"? or do I misunderstand anything?

Again, thanks a lot.

zsyzzsoft commented 4 years ago

Yes, the results listed are the 99500-iter ckpt.
The original VTN also uses this implementation. The upsampling branches are not mentioned because it was originally designed for deep supervision but we did not do it so we did not think it is going to affect that much.
This finding is interesting. However, in our reported results, the score is higher than the original VTN paper only because the loss function is only computed on the very end images.

yjump commented 4 years ago

Hi， thank you for your kind answers. For your #2 comment， I am just afraid that there may be something wrong in the pratical implementation of VTN and what is described in VTN (IJBHI) paper. The "deep supervision" branches is concatenated and used as a part of the input of the next deconv layer, but the original paper only metioned skip connection. And you can have a look at the number of conv layers in the encoder part and affine net, it is different from what is drawn in the figure 3 of VTN. For your #3 comment， maybe there is something misunderstood. I do not reproduce the results reported in VTN, because the training set and testing set are different in two works. And I don't understand what is "the very end images" . What I try to report is that, I implement a VTN the same as what is drawn in IJBHI(VTN) (because in this recursive cascaded paper, it says "inherit VTN" ) but a relatively obvious difference occurs when the architecture is adjusted to be exactly the same as what is descirbe in VTN.

This may be not a big problem, but may confuse other people interested in these two work.

Anyway, thanks for your contributions

zsyzzsoft commented 4 years ago

Yes, thank you for your interest; I acknowledge that the VTN paper is unclear about the upsampling branches. Here we can assume that VTN is supposed to have them.

microsoft / Recursive-Cascaded-Networks

The implementation difference of VTN in this repo comparing to VTN paper and retraining results report. #21