sl-93 / SUPERVISED-CONTRASTIVE-LEARNING-FOR-PRE-TRAINED-LANGUAGE-MODEL-FINE-TUNING

in this project, I've implemented the Facebook paper about fine tuning RoBERTa with contrastive loss.
57 stars 5 forks source link

there may be bugs #1

Open enjlife opened 2 years ago

enjlife commented 2 years ago
for i in range(len(embedding)):
        n_i = label.tolist().count(label[i]) - 1
        inner_sum = 0
        # calculate inner sum
        for j in range(len(embedding) - 1):
            if label[i] == label[j]:
                inner_sum = inner_sum + np.log(dis[i][j] / row_sum[i])
        if n_i != 0:
            contrastive_loss += (inner_sum / (-n_i))
        else:
            contrastive_loss += 0

for example, for label=[1,2,0,1,2]

tensor([[0., 0., 0., 1., 0.],
        [0., 0., 0., 0., 1.],
        [0., 0., 0., 0., 0.],
        [1., 0., 0., 0., 0.],
        [0., 1., 0., 0., 0.]])

when i=1,j=4,inner_sum should add a number,but j is less then 4.

sl-93 commented 2 years ago
for i in range(len(embedding)):
        n_i = label.tolist().count(label[i]) - 1
        inner_sum = 0
        # calculate inner sum
        for j in range(len(embedding) - 1):
            if label[i] == label[j]:
                inner_sum = inner_sum + np.log(dis[i][j] / row_sum[i])
        if n_i != 0:
            contrastive_loss += (inner_sum / (-n_i))
        else:
            contrastive_loss += 0

for example, for label=[1,2,0,1,2]

tensor([[0., 0., 0., 1., 0.],
        [0., 0., 0., 0., 1.],
        [0., 0., 0., 0., 0.],
        [1., 0., 0., 0., 0.],
        [0., 1., 0., 0., 0.]])

when i=1,j=4,inner_sum should add a number,but j is less then 4.

because the " diagonal" elements were removed from the matrix before the loop.

 remove diagonal elements from matrix

dis​ ​=​ ​cosine_sim​[​~​np​.​eye​(​cosine_sim​.​shape​[​0​], ​dtype​=​bool​)].​reshape​(​cosine_sim​.​shape​[​0​], ​-​1​)

and it is not a problem at all. it is a part of formula.

enjlife commented 2 years ago
for i in range(len(embedding)):
        n_i = label.tolist().count(label[i]) - 1
        inner_sum = 0
        # calculate inner sum
        for j in range(len(embedding) - 1):
            if label[i] == label[j]:
                inner_sum = inner_sum + np.log(dis[i][j] / row_sum[i])
        if n_i != 0:
            contrastive_loss += (inner_sum / (-n_i))
        else:
            contrastive_loss += 0

for example, for label=[1,2,0,1,2]

tensor([[0., 0., 0., 1., 0.],
        [0., 0., 0., 0., 1.],
        [0., 0., 0., 0., 0.],
        [1., 0., 0., 0., 0.],
        [0., 1., 0., 0., 0.]])

when i=1,j=4,inner_sum should add a number,but j is less then 4.

because the " diagonal" was deleted from the matrix before the loop.

 remove diagonal elements from matrix

dis​ ​=​ ​cosine_sim​[​~​np​.​eye​(​cosine_sim​.​shape​[​0​], ​dtype​=​bool​)].​reshape​(​cosine_sim​.​shape​[​0​], ​-​1​)

Thank you. I knew that. I changed code

# labels=[1,2,0,1,2]
for i in range(5):
    n_i = labels.view(-1,).tolist().count(labels[i]) - 1
    inner_sum = 0
    for j in range(5 - 1):
        if labels[i] == labels[j]:
            print('{}\t{}'.format(i, j))
    if n_i != 0:
        pass
    else:
        pass

0   0
0   3
1   1
2   2
3   0
3   3
4   1

for index=0,label=1, i think that inner_sum should add dis[0][3] instead of dis[0][0] and dis[0][3], because removed diag, add dis[0][2]
for index=2,label=0, i think that inner_sum should add 0 instead of dis[2][2]. am i right?

thank you for reply!

sl-93 commented 2 years ago
for i in range(len(embedding)):
        n_i = label.tolist().count(label[i]) - 1
        inner_sum = 0
        # calculate inner sum
        for j in range(len(embedding) - 1):
            if label[i] == label[j]:
                inner_sum = inner_sum + np.log(dis[i][j] / row_sum[i])
        if n_i != 0:
            contrastive_loss += (inner_sum / (-n_i))
        else:
            contrastive_loss += 0

for example, for label=[1,2,0,1,2]

tensor([[0., 0., 0., 1., 0.],
        [0., 0., 0., 0., 1.],
        [0., 0., 0., 0., 0.],
        [1., 0., 0., 0., 0.],
        [0., 1., 0., 0., 0.]])

when i=1,j=4,inner_sum should add a number,but j is less then 4.

because the " diagonal" was deleted from the matrix before the loop.

 remove diagonal elements from matrix

dis​ ​=​ ​cosine_sim​[​~​np​.​eye​(​cosine_sim​.​shape​[​0​], ​dtype​=​bool​)].​reshape​(​cosine_sim​.​shape​[​0​], ​-​1​)

Thank you. I knew that. I changed code

# labels=[1,2,0,1,2]
for i in range(5):
    n_i = labels.view(-1,).tolist().count(labels[i]) - 1
    inner_sum = 0
    for j in range(5 - 1):
        if labels[i] == labels[j]:
            print('{}\t{}'.format(i, j))
    if n_i != 0:
        pass
    else:
        pass

0 0
0 3
1 1
2 2
3 0
3 3
4 1

for index=0,label=1, i think that inner_sum should add dis[0][3] instead of dis[0][0] and dis[0][3]. for index=2,label=0, i think that inner_sum should add 0 instead of dis[2][2]. am i right?

thank you for reply!

yeah you're right. I fixed the problem. thank you so much!

enjlife commented 2 years ago
for i in range(len(embedding)):
        n_i = label.tolist().count(label[i]) - 1
        inner_sum = 0
        # calculate inner sum
        for j in range(len(embedding) - 1):
            if label[i] == label[j]:
                inner_sum = inner_sum + np.log(dis[i][j] / row_sum[i])
        if n_i != 0:
            contrastive_loss += (inner_sum / (-n_i))
        else:
            contrastive_loss += 0

for example, for label=[1,2,0,1,2]

tensor([[0., 0., 0., 1., 0.],
        [0., 0., 0., 0., 1.],
        [0., 0., 0., 0., 0.],
        [1., 0., 0., 0., 0.],
        [0., 1., 0., 0., 0.]])

when i=1,j=4,inner_sum should add a number,but j is less then 4.

because the " diagonal" was deleted from the matrix before the loop.

 remove diagonal elements from matrix

dis​ ​=​ ​cosine_sim​[​~​np​.​eye​(​cosine_sim​.​shape​[​0​], ​dtype​=​bool​)].​reshape​(​cosine_sim​.​shape​[​0​], ​-​1​)

Thank you. I knew that. I changed code

# labels=[1,2,0,1,2]
for i in range(5):
    n_i = labels.view(-1,).tolist().count(labels[i]) - 1
    inner_sum = 0
    for j in range(5 - 1):
        if labels[i] == labels[j]:
            print('{}\t{}'.format(i, j))
    if n_i != 0:
        pass
    else:
        pass

0   0
0   3
1   1
2   2
3   0
3   3
4   1

for index=0,label=1, i think that inner_sum should add dis[0][3] instead of dis[0][0] and dis[0][3]. for index=2,label=0, i think that inner_sum should add 0 instead of dis[2][2]. am i right? thank you for reply!

yeah you're right. I fixed the problem. thank you so much!

thank you. I think there may be still problem and it may be not a efficient way, so i rewrite the code. hope it useful for you.

class SupConLossPLMS(torch.nn.Module):
    """Supervised Contrastive Learning for Pre-trained Language Model Fine-tuning: https://arxiv.org/abs/2011.01403
    """
    def __init__(self, device, temperature=0.05):
        super(SupConLossPLMS, self).__init__()
        self.tem = temperature
        self.device = device

    def forward(self, batch_emb, labels=None):
        labels = labels.view(-1, 1)
        batch_size = batch_emb.shape[0]
        mask = torch.eq(labels, labels.T).float()
        norm_emb = F.normalize(batch_emb, dim=1, p=2)
        # compute logits
        dot_contrast = torch.div(torch.matmul(norm_emb, norm_emb.T), self.tem)
        # for numerical stability
        logits_max, _ = torch.max(dot_contrast, dim=1, keepdim=True)  # _返回索引
        logits = dot_contrast - logits_max.detach()
        # 索引应该保证设备相同
        logits_mask = torch.scatter(torch.ones_like(mask), 1, torch.arange(batch_size).view(-1, 1).to(self.device), 0)
        mask = mask * logits_mask
        # compute log_prob
        exp_logits = torch.exp(logits) * logits_mask
        log_prob = logits - torch.log(exp_logits.sum(1, keepdim=True))
        mask_sum = mask.sum(1)
        # 防止出现NAN
        mask_sum = torch.where(mask_sum == 0, torch.ones_like(mask_sum), mask_sum)
        mean_log_prob_pos = -(mask * log_prob).sum(1) / mask_sum
        return mean_log_prob_pos.mean()

reference https://github.com/HobbitLong/SupContrast/blob/master/losses.py

Thank you for helping me understanding the article.

sl-93 commented 2 years ago
for i in range(len(embedding)):
        n_i = label.tolist().count(label[i]) - 1
        inner_sum = 0
        # calculate inner sum
        for j in range(len(embedding) - 1):
            if label[i] == label[j]:
                inner_sum = inner_sum + np.log(dis[i][j] / row_sum[i])
        if n_i != 0:
            contrastive_loss += (inner_sum / (-n_i))
        else:
            contrastive_loss += 0

for example, for label=[1,2,0,1,2]

tensor([[0., 0., 0., 1., 0.],
        [0., 0., 0., 0., 1.],
        [0., 0., 0., 0., 0.],
        [1., 0., 0., 0., 0.],
        [0., 1., 0., 0., 0.]])

when i=1,j=4,inner_sum should add a number,but j is less then 4.

because the " diagonal" was deleted from the matrix before the loop.

 remove diagonal elements from matrix

dis​ ​=​ ​cosine_sim​[​~​np​.​eye​(​cosine_sim​.​shape​[​0​], ​dtype​=​bool​)].​reshape​(​cosine_sim​.​shape​[​0​], ​-​1​)

Thank you. I knew that. I changed code

# labels=[1,2,0,1,2]
for i in range(5):
    n_i = labels.view(-1,).tolist().count(labels[i]) - 1
    inner_sum = 0
    for j in range(5 - 1):
        if labels[i] == labels[j]:
            print('{}\t{}'.format(i, j))
    if n_i != 0:
        pass
    else:
        pass

0 0
0 3
1 1
2 2
3 0
3 3
4 1

for index=0,label=1, i think that inner_sum should add dis[0][3] instead of dis[0][0] and dis[0][3]. for index=2,label=0, i think that inner_sum should add 0 instead of dis[2][2]. am i right? thank you for reply!

yeah you're right. I fixed the problem. thank you so much!

thank you. I think there may be still problem and it may be not a efficient way, so i rewrite the code. hope it useful for you.

class SupConLossPLMS(torch.nn.Module):
    """Supervised Contrastive Learning for Pre-trained Language Model Fine-tuning: https://arxiv.org/abs/2011.01403
    """
    def __init__(self, device, temperature=0.05):
        super(SupConLossPLMS, self).__init__()
        self.tem = temperature
        self.device = device

    def forward(self, batch_emb, labels=None):
        labels = labels.view(-1, 1)
        batch_size = batch_emb.shape[0]
        mask = torch.eq(labels, labels.T).float()
        norm_emb = F.normalize(batch_emb, dim=1, p=2)
        # compute logits
        dot_contrast = torch.div(torch.matmul(norm_emb, norm_emb.T), self.tem)
        # for numerical stability
        logits_max, _ = torch.max(dot_contrast, dim=1, keepdim=True)  # _返回索引
        logits = dot_contrast - logits_max.detach()
        # 索引应该保证设备相同
        logits_mask = torch.scatter(torch.ones_like(mask), 1, torch.arange(batch_size).view(-1, 1).to(self.device), 0)
        mask = mask * logits_mask
        # compute log_prob
        exp_logits = torch.exp(logits) * logits_mask
        log_prob = logits - torch.log(exp_logits.sum(1, keepdim=True))
        mask_sum = mask.sum(1)
        # 防止出现NAN
        mask_sum = torch.where(mask_sum == 0, torch.ones_like(mask_sum), mask_sum)
        mean_log_prob_pos = -(mask * log_prob).sum(1) / mask_sum
        return mean_log_prob_pos.mean()

reference https://github.com/HobbitLong/SupContrast/blob/master/losses.py

Thank you for helping me understanding the article.

viko-3 commented 2 years ago
for i in range(len(embedding)):
        n_i = label.tolist().count(label[i]) - 1
        inner_sum = 0
        # calculate inner sum
        for j in range(len(embedding) - 1):
            if label[i] == label[j]:
                inner_sum = inner_sum + np.log(dis[i][j] / row_sum[i])
        if n_i != 0:
            contrastive_loss += (inner_sum / (-n_i))
        else:
            contrastive_loss += 0

for example, for label=[1,2,0,1,2]

tensor([[0., 0., 0., 1., 0.],
        [0., 0., 0., 0., 1.],
        [0., 0., 0., 0., 0.],
        [1., 0., 0., 0., 0.],
        [0., 1., 0., 0., 0.]])

when i=1,j=4,inner_sum should add a number,but j is less then 4.

because the " diagonal" was deleted from the matrix before the loop.

 remove diagonal elements from matrix

dis​ ​=​ ​cosine_sim​[​~​np​.​eye​(​cosine_sim​.​shape​[​0​], ​dtype​=​bool​)].​reshape​(​cosine_sim​.​shape​[​0​], ​-​1​)

Thank you. I knew that. I changed code

# labels=[1,2,0,1,2]
for i in range(5):
    n_i = labels.view(-1,).tolist().count(labels[i]) - 1
    inner_sum = 0
    for j in range(5 - 1):
        if labels[i] == labels[j]:
            print('{}\t{}'.format(i, j))
    if n_i != 0:
        pass
    else:
        pass

0 0
0 3
1 1
2 2
3 0
3 3
4 1

for index=0,label=1, i think that inner_sum should add dis[0][3] instead of dis[0][0] and dis[0][3]. for index=2,label=0, i think that inner_sum should add 0 instead of dis[2][2]. am i right? thank you for reply!

yeah you're right. I fixed the problem. thank you so much!

thank you. I think there may be still problem and it may be not a efficient way, so i rewrite the code. hope it useful for you.

class SupConLossPLMS(torch.nn.Module):
    """Supervised Contrastive Learning for Pre-trained Language Model Fine-tuning: https://arxiv.org/abs/2011.01403
    """
    def __init__(self, device, temperature=0.05):
        super(SupConLossPLMS, self).__init__()
        self.tem = temperature
        self.device = device

    def forward(self, batch_emb, labels=None):
        labels = labels.view(-1, 1)
        batch_size = batch_emb.shape[0]
        mask = torch.eq(labels, labels.T).float()
        norm_emb = F.normalize(batch_emb, dim=1, p=2)
        # compute logits
        dot_contrast = torch.div(torch.matmul(norm_emb, norm_emb.T), self.tem)
        # for numerical stability
        logits_max, _ = torch.max(dot_contrast, dim=1, keepdim=True)  # _返回索引
        logits = dot_contrast - logits_max.detach()
        # 索引应该保证设备相同
        logits_mask = torch.scatter(torch.ones_like(mask), 1, torch.arange(batch_size).view(-1, 1).to(self.device), 0)
        mask = mask * logits_mask
        # compute log_prob
        exp_logits = torch.exp(logits) * logits_mask
        log_prob = logits - torch.log(exp_logits.sum(1, keepdim=True))
        mask_sum = mask.sum(1)
        # 防止出现NAN
        mask_sum = torch.where(mask_sum == 0, torch.ones_like(mask_sum), mask_sum)
        mean_log_prob_pos = -(mask * log_prob).sum(1) / mask_sum
        return mean_log_prob_pos.mean()

reference https://github.com/HobbitLong/SupContrast/blob/master/losses.py

Thank you for helping me understanding the article.

I think the code you rewrite is good. But can you tell me the shape about the batch_emb and labels? or give me a example?