Understanding SimPLE head

ElisonSherton commented 8 months ago

Hi guys,

Kindly note that I have a single GPU and the below questions are asked based on the same i.e. my training is with a single GPU and not distributed.

I was trying to go through the implementation of SimPLE head in opensphere/module/head/simple.py .

In the forward implementation we can see the following

mask_p[:, pt:pt+x.size(0)].filldiagonal(False)

I think the intuition behind this filling is that we don't want to consider pair (i, i) as a positive pair, hence for the current batch, we will zero out those positions in the positive mask. But the current batch is not necessarily placed between 0:batch_size in the bank (Since I am using a single GPU self.rank is always 0); it will be placed at self.ptr:self.ptr + batch_size, shouldn't the operation given above zero out the diagonals starting from self.ptr instead of starting from 0? For me self.rank

For iresnet100, the wrap mode is polarlike which means that for the f_wrapping function, softplus_wrapping will be applied. On investigating this particular function, I observed that the output dimensionalities of the vectors are altered.

def softplus_wrapping(raw_feats):
    mags = F.softplus(raw_feats[..., :1], beta=1)
    feats = mags * F.normalize(raw_feats[..., 1:], dim=-1)
    return feats

The softplus activation is applied on the 0th index of the embedding and the remainder of the embedding i.e. from 1:embed_dim is normalized and scaled with the softplus activation of the 0th index. Why is this operation performed? Also why is the embedding dimension changed?

On doing a forward pass with embed_dimension = 64 and batch_size of 64, I obtain the following shapes during these particular steps:

Before F_Wrapping X shape: torch.Size([64, 64])
After F_Wrapping  X shape: torch.Size([64, 63])
After F_Wrapping  X_bank shape: torch.Size([8192, 63])
After F_Fusing X shape: torch.Size([64, 63])
After F_Fusing X_bank shape: torch.Size([8192, 63])
After F_Scoring Logits shape: torch.Size([64, 8192])
Positive Logits: torch.Size([256])
Negative Logits: torch.Size([524032])

Can you please explain what is going on in this F_wrapping and what it's purpose is? I could not find it in the paper too...

Thanks, Vinayak.

ydwen commented 8 months ago

Hi Vinayak,

For the question about mask_p, you can think of the code mask_p[:, pt:pt+x.size(0)].filldiagonal(False) as two operations.

temp = mask_p[:, pt:pt+x.size(0)] # take a sub-mask from the pt-th column to pt+x.size(0)-th column.
temp.filldiagonal(False) # zero-out the diagonal, meaning the zeroing-out is starting from 0 in the sub-mask (temp), or starting from pt in the mask.

For the second question, F_wrapping is a parameterization method for the learned features (vectors). We can use different ways to the parameterization and the released code shows two.

In SimPLE paper, we use softplus(x[:, 0]) as the magnitude and normalize(x[:, 1:]) as the direction, simply because of the more stable convergence and better performance. This is just our empirical observation. We hypothesize that this parameterization may facilitate the training process, not but quite sure.

chuong98 commented 7 months ago

Hi @ydwen I am trying to understand your implementation of SimPLE loss.

In the paper, Equ 6, but in your implementation, you put b_theta=0.3 into the score function, and let the self.bias is learning parameter.

Is this a bug?

ydwen / opensphere

Understanding SimPLE head #32