Question about the algorithm and training procedure

zzzc18 commented 8 months ago

Hi Cheng,

I'm new to the VOS area and after reading the paper I've still got two questions about the algorithm.

Does the readout in XMem (and in Cutie) turn the VOS task from learning an $\text{img}\rightarrow\text{mask}$ map to learning a $\text{similar img}\rightarrow\text{similar mask}\rightarrow\text{mask}$ map through the retrieval process? (local feature level)
Is the long-term memory module not involved in the training process? Does it only occur at test time? As you state in the paper the training sequences are of length eight. Which seems smaller than the $T_{max}=10$.

Thank you for taking the time to read this issue. I greatly appreciate any advice you can provide.

hkchengrex commented 7 months ago

zzzc18 commented 7 months ago

Thank you for your reply!

hkchengrex / XMem