From what I understood, K is the number of categories in a Dataset. This affects the dimensionality of the embeddings, which is (Batch, K, C/8, Time, Height, Width), since we compute different embeddings for each category. In the case of DAVIS, k=11, since we have 10 categories and the background category.
If this is correct, then I'm curious why throughout the code, you ignore the embeddings for the background (K=0). Wouldn't this increase performance, along the lines of this paper by Yang et al?
Also, if we're not using K=0, aren't we wasting memory by calculating these embeddings and storing them in VRAM?
Finally, since I'm using this for just people, I've set K=2. Are there any problems with this change?
From what I understood,
K
is the number of categories in a Dataset. This affects the dimensionality of the embeddings, which is (Batch, K, C/8, Time, Height, Width), since we compute different embeddings for each category. In the case of DAVIS, k=11, since we have 10 categories and the background category.If this is correct, then I'm curious why throughout the code, you ignore the embeddings for the background (K=0). Wouldn't this increase performance, along the lines of this paper by Yang et al?
Also, if we're not using K=0, aren't we wasting memory by calculating these embeddings and storing them in VRAM?
Finally, since I'm using this for just people, I've set K=2. Are there any problems with this change?