Open ElisonSherton opened 8 months ago
Hi Vinayak,
For the question about mask_p, you can think of the code mask_p[:, pt:pt+x.size(0)].filldiagonal(False) as two operations.
For the second question, F_wrapping is a parameterization method for the learned features (vectors). We can use different ways to the parameterization and the released code shows two.
In SimPLE paper, we use softplus(x[:, 0]) as the magnitude and normalize(x[:, 1:]) as the direction, simply because of the more stable convergence and better performance. This is just our empirical observation. We hypothesize that this parameterization may facilitate the training process, not but quite sure.
Hi @ydwen I am trying to understand your implementation of SimPLE loss.
b_theta=0.3
into the score function,
and let the self.bias is learning parameter.Is this a bug?
Hi guys,
Kindly note that I have a single GPU and the below questions are asked based on the same i.e. my training is with a single GPU and not distributed.
I was trying to go through the implementation of SimPLE head in
opensphere/module/head/simple.py
.In the forward implementation we can see the following
mask_p[:, pt:pt+x.size(0)].filldiagonal(False)
I think the intuition behind this filling is that we don't want to consider pair (i, i) as a positive pair, hence for the current batch, we will zero out those positions in the positive mask. But the current batch is not necessarily placed between
0:batch_size
in the bank (Since I am using a single GPU self.rank is always 0); it will be placed atself.ptr:self.ptr + batch_size
, shouldn't the operation given above zero out the diagonals starting fromself.ptr
instead of starting from 0? For me self.rankFor iresnet100, the wrap mode is
polarlike
which means that for the f_wrapping function, softplus_wrapping will be applied. On investigating this particular function, I observed that the output dimensionalities of the vectors are altered.The softplus activation is applied on the 0th index of the embedding and the remainder of the embedding i.e. from 1:embed_dim is normalized and scaled with the softplus activation of the 0th index. Why is this operation performed? Also why is the embedding dimension changed?
On doing a forward pass with embed_dimension = 64 and batch_size of 64, I obtain the following shapes during these particular steps:
Can you please explain what is going on in this F_wrapping and what it's purpose is? I could not find it in the paper too...
Thanks, Vinayak.