seoungwugoh / STM

Video Object Segmentation using Space-Time Memory Networks
405 stars 81 forks source link

Understanding `K` #31

Open tobyshooters opened 4 years ago

tobyshooters commented 4 years ago

From what I understood, K is the number of categories in a Dataset. This affects the dimensionality of the embeddings, which is (Batch, K, C/8, Time, Height, Width), since we compute different embeddings for each category. In the case of DAVIS, k=11, since we have 10 categories and the background category.

If this is correct, then I'm curious why throughout the code, you ignore the embeddings for the background (K=0). Wouldn't this increase performance, along the lines of this paper by Yang et al?

Also, if we're not using K=0, aren't we wasting memory by calculating these embeddings and storing them in VRAM?

Finally, since I'm using this for just people, I've set K=2. Are there any problems with this change?