hkchengrex / XMem

[ECCV 2022] XMem: Long-Term Video Object Segmentation with an Atkinson-Shiffrin Memory Model
https://hkchengrex.com/XMem/
MIT License
1.72k stars 191 forks source link

About Training Testing Gap #122

Closed yyang181 closed 11 months ago

yyang181 commented 11 months ago

Hi, I noticed that, during the training process, a memory bank is constructed by randomly selecting three frames from the input video consisting of eight frames. Notably, the ground truth (GT) values, instead of the model's predicted values, are stored in the memory bank. Could you elucidate the reasons behind the incongruity between the training and testing logic?

hkchengrex commented 11 months ago

We do use the model's predicted value in the memory bank. See https://github.com/hkchengrex/XMem/blob/4589acce67dfd952b28f779f9e55a39ce8ebb9d6/model/trainer.py#L111-L113