Closed lqxisok closed 4 years ago
This is how the training goes, a) prepare [A_image, A_mask, B_image, C_image] for input, [B_mask, C_mask] for GT. b) memorize [A_image, A_mask]. c) segment [B_mask] using the memory of A. d) memorize [B_image, B_mask]. e) segment [C_mask] using the memory of A, B. losses are computed B_mask, C_masks.
Got it. I make a small mistake in coding. Now it works fine. Thanks for your reply
It was mentioned in the paper that the STM samples three frames during the main training stage. After I random sample three frames how the model do forward confuses me for a while? Suppose here are three frames named A,B and C, should I first compute the segmentation result of B according the
prev_key
andprev_value
of A generated inmemorize
stage and then feed the B and C into next forward pass. Or should I just need compute the segmentation result of C?