Why TMO performs better than TMO++?

JawadTawhidi commented 8 months ago

Hi, sorry for disturbing you so much. It is because I really liked your approach. However I have few questions:

Is there any difference between TMO and TMO++ in training stage? as much as I checked the papers and looked at the code, I see no difference between them during training stage, both of them use RGB images and Optical Flows as input of motion encoder randomly.
The major difference I saw between them is the output selection algorithm which is not affecting the training process. am I right?
I implemented TMO(Using rn101 encoder) and TMO++(Using rn101 encoder) on some ultrasound images and videos, I used some ultrasound images instead of Duts and some ultrasound videos instead of DAVIS 2016, but TMO performs better than TMO++(While the data set is same for both of them) , for example TMO gets 66.4 in terms of mean of J and F, but TMO++ gets 62.3, the difference is very high and does not seem very reasonable, this made me confused, it is only reasonable if there is a difference in training stage of TMO and TMO++, Would you please help me to understand it?

suhwan-cho commented 8 months ago

There is no difference between TMO and TMO++ during network training.
Yes, you are right.
TMO uses optical flow maps as motion encoder input, whereas TMO++ adaptively uses RGB images and optical flow maps as motion encoder input. If TMO++ shows lower performance than TMO, I think it is because using RGB images as motion encoder input shows much lower performance than using optical flow maps as motion encoder input. I recommend you to check the performance of these two protocols.

JawadTawhidi commented 8 months ago

Thank you so much for your response. However, my main confusion is that if the TMO++ is not performing better than TMO, according to the papers and code, it must not perform lower than TMO.

Because as much as I understood the TMO++ considers RGB images as motion encoder input once, then considers Flow maps as input to motion encoder in second turn, then it compares the results and outputs the better one.

So in this case, it means once it is giving flow maps as input to motion encoder (which is exactly as actually TMO) and must give the result of TMO, so in second turn if giving RGB images as input to motion encoder does not give higher result, the output selection algorithm must select the first output which must be the same as TMO.

Am I wrong? or maybe it is possible and reasonable for TMO++ to have lower performance compare to TMO on some datasets? (However in TMO++ paper I saw the J value of TMO++(Mit-b1) on DAVIS2016 is 86.5 while the J value of TMO(Mit-b1) is 86.6 and it shows that it is possible TMO++ may result a bit lower compare to TMO on some datasets)

suhwan-cho commented 8 months ago

TMO++ selects output based on "confidence score", not "evaluation score" (because this is cheating). Normally, output with high confidence score shows higher evaluation score, but this is not always the case.

As you mentioned, this is why TMO++ shows lower performance than TMO on DAVIS 2016 validation set.

If performance gap between different motion encoder inputs (RGB image vs. optical flow map) is large, the correlation between confidence score and evaluation score is not clearly evident.

Therefore, I recommend you to skip the output selection step as it is not stable in such cases.

suhwan-cho / TMO

Why TMO performs better than TMO++? #17