yihwu / DiPmark

0 stars 0 forks source link

About DiPmark Detector. #1

Open intrepidLi opened 1 month ago

intrepidLi commented 1 month ago

Thanks for the great work and resources.

May I ask if there is the code for detecting the dipmark? It seems there isn't implementation for the dipmark now like the watermark detector in https://github.com/jwkirchenbauer/lm-watermarking.

yihwu commented 1 month ago

Thank you for your interests in our work. Our code is based on our previous work https://github.com/xiaoniu-578fa6bff964d005/UnbiasedWatermark and the detector is defined at https://github.com/yihwu/DiPmark/blob/34abbeb527243c79bda8043313bb797a731f4ae7/supplement_experiments/common.py#L577. I admit the codebase is a bit mess and we will publish a better version later :)

intrepidLi commented 1 month ago

Thanks! And I want to know which metric you use to detect the watermark in below code block(Line 645 from supplement_experiments/common.py).

It seems like in the paper the z-score you use is $L_G(\gamma) - (1 - \gamma) / \sqrt{n}$ , the var mid2 is the most similar in form.

        score, label_attention_mask = get_dip_score(
                    vocab_size, tbatch, wp, device, la_wp=la_wp
                )
        #         print(label_attention_mask[0])
        score = score * label_attention_mask
        gamma_list  = [0.3,0.4,0.5,0.6,0.7]
        score_col = torch.zeros([score.shape[0]]+[len(gamma_list)],device = device)
        prob_col = torch.zeros([score.shape[0]]+[len(gamma_list)],device = device)
        num_g_tokens = torch.zeros([score.shape[0]]+[len(gamma_list)],device = device)
        seq_len = torch.sum(label_attention_mask,dim=-1,keepdim=False)
        for i,gm in enumerate(gamma_list):

#             print((1-gm)*seq_len)
            green_tokens = torch.sum(score>=gm,dim=-1,keepdim=False)
#             print(green_tokens)
            mid2=(green_tokens - (1-gm)*seq_len)/torch.sqrt(seq_len)
            mid=green_tokens/seq_len
            kl_div = mid*torch.log(mid/(1-gm))+(1-mid)*torch.log((1-mid)/gm)
            prob = torch.exp(-kl_div*seq_len)
            score_col[:,i] = torch.exp(-2*mid2*mid2)
            prob_col[:,i] = prob
            num_g_tokens[:,i] = green_tokens

        best_app_score = torch.min(score_col,dim = -1).values.cpu().tolist()
        best_score = torch.min(prob_col,dim = -1).values.cpu().tolist()
        john_score = prob_col[...,2].cpu().tolist()
        john_app_score = score_col[...,2].cpu().tolist()
        g_tokens = num_g_tokens[...,2].cpu().tolist()
        max_g_tokens = torch.max(num_g_tokens,dim = -1).values.cpu().tolist()
intrepidLi commented 1 month ago

Thanks! And I want to know which metric you use to detect the watermark in below code block(Line 645 from supplement_experiments/common.py).

It seems like in the paper the z-score you use is LG(γ)−(1−γ)/n , the var mid2 is the most similar in form.

        score, label_attention_mask = get_dip_score(
                    vocab_size, tbatch, wp, device, la_wp=la_wp
                )
        #         print(label_attention_mask[0])
        score = score * label_attention_mask
        gamma_list  = [0.3,0.4,0.5,0.6,0.7]
        score_col = torch.zeros([score.shape[0]]+[len(gamma_list)],device = device)
        prob_col = torch.zeros([score.shape[0]]+[len(gamma_list)],device = device)
        num_g_tokens = torch.zeros([score.shape[0]]+[len(gamma_list)],device = device)
        seq_len = torch.sum(label_attention_mask,dim=-1,keepdim=False)
        for i,gm in enumerate(gamma_list):

#             print((1-gm)*seq_len)
            green_tokens = torch.sum(score>=gm,dim=-1,keepdim=False)
#             print(green_tokens)
            mid2=(green_tokens - (1-gm)*seq_len)/torch.sqrt(seq_len)
            mid=green_tokens/seq_len
            kl_div = mid*torch.log(mid/(1-gm))+(1-mid)*torch.log((1-mid)/gm)
            prob = torch.exp(-kl_div*seq_len)
            score_col[:,i] = torch.exp(-2*mid2*mid2)
            prob_col[:,i] = prob
            num_g_tokens[:,i] = green_tokens

        best_app_score = torch.min(score_col,dim = -1).values.cpu().tolist()
        best_score = torch.min(prob_col,dim = -1).values.cpu().tolist()
        john_score = prob_col[...,2].cpu().tolist()
        john_app_score = score_col[...,2].cpu().tolist()
        g_tokens = num_g_tokens[...,2].cpu().tolist()
        max_g_tokens = torch.max(num_g_tokens,dim = -1).values.cpu().tolist()

Sorry, the formula should be $(L_G(\gamma) - (1 - \gamma)n) / \sqrt{n}$