seoungwugoh / STM

Video Object Segmentation using Space-Time Memory Networks
405 stars 81 forks source link

There is a question when I try to reproduce the training process #4

Closed lqxisok closed 4 years ago

lqxisok commented 4 years ago

It was mentioned in the paper that the STM samples three frames during the main training stage. After I random sample three frames how the model do forward confuses me for a while? Suppose here are three frames named A,B and C, should I first compute the segmentation result of B according the prev_key and prev_value of A generated in memorize stage and then feed the B and C into next forward pass. Or should I just need compute the segmentation result of C?

seoungwugoh commented 4 years ago

This is how the training goes, a) prepare [A_image, A_mask, B_image, C_image] for input, [B_mask, C_mask] for GT. b) memorize [A_image, A_mask]. c) segment [B_mask] using the memory of A. d) memorize [B_image, B_mask]. e) segment [C_mask] using the memory of A, B. losses are computed B_mask, C_masks.

lqxisok commented 4 years ago

Got it. I make a small mistake in coding. Now it works fine. Thanks for your reply