Closed xbdxwyh closed 3 years ago
Hi,
For unsupervised models, you should use the representation before pooling -- thus taking outputs.pooler_output
is wrong here. Also notice that the data we take to calculate alignment are sentence pairs that have scores higher than 4 in STS-B.
Hi, Thank you for your prompt reply!
The score of 1.2 is computed using the representation before pooling ( pooler_output = outputs.last_hidden_state[:,0]
). When the representation after pooling is used, the score is 1.632. And in this process, we've been using the sentence pairs that have scores higher than 4 in STS-B.
Looking forward to your reply!
Interesting... In that case the average cosine similarity between two positive sentences would be ~0.4, which doesn't look right to me. I think for positive pairs, the cosine similarity can be very high (>0.8 in general). Maybe take original bert as a start point for debugging?
Thanks for your answer! We make a mistake in the reshape step ( in the get_pair_emb function ). After changing the reshape step, we get the same results. Thanks!
Hi,
When we compute the alignment of the model unsup-simcse-bert-base-uncased
, we get 0.2155, the same result as in the paper. But when we use the model sup-simcse-bert-base-uncased
, we get the alignment of 0.1286. it is still different.
Furthermore, we don't get the same uniformity as in the paper. We get the uniformity of about -2.3116 by using the model unsup-simcse-bert-base-uncased
with all sentences from STS-Benchmark. But it is about -2.7 in the paper. The code is as follows:
def get_unif(model, dataloader):
unif_all = []
with torch.no_grad():
for data in dataloader:
input_ids = torch.cat((data['input_ids'][0],data['input_ids'][1])).cuda()
attention_mask = torch.cat((data['attention_mask'][0],data['attention_mask'][1])).cuda()
token_type_ids = torch.cat((data['token_type_ids'][0],data['token_type_ids'][1])).cuda()
z1,z2 = get_pair_emb(model, input_ids, attention_mask, token_type_ids)
z1 = F.normalize(z1,p=2,dim=1)
z2 = F.normalize(z2,p=2,dim=1)
z = torch.cat((z1,z2))
unif_all.append(uniform_loss(z, t=2))
return unif_all
Many Thanks!
Hi,
You should use MLP when you use the supervised model. Regarding uniformity, I'm not really sure where the difference comes from. My guess is that we use a different proportion of data somehow. There is a chance that I didn't concatenate sentence A and sentence B together (i.e., only calculated the alignment between sentA and sentB) but I'm not sure since I cleaned up the code. But this shouldn't affect the analysis anyway as long as the calculation is consistent.
Thanks for your patient answer!
The alignment computed with the function implemented by Wang and Isola differs link a lot with your paper. I compute the alignment by that function directly, and I get a score of 1.21. But as shown in Fig.3 the score of the paper is less than 0.25. Could you tell me how to compute the alignment in this paper? My code is as follows: