ruotianluo / self-critical.pytorch

Unofficial pytorch implementation for Self-critical Sequence Training for Image Captioning. and others.
MIT License
995 stars 278 forks source link

强化学习之后生成不完整的描述? #283

Open Xiong-can opened 10 months ago

Xiong-can commented 10 months ago

After reinforcement learning, the description will be incomplete such as: a motorcycle parked in a parking lot with a ..

ruotianluo commented 10 months ago

确实是会这样的。你看一下bad ending rate,大概多大。如果你用的这个库训练的话,这个rate不会特别大。原因是cider metric的问题。如果你比如说加一个bad ending penalty reward,应该能alleviate这个问题我觉得。Ruotian LuoOn Dec 18, 2023, at 9:26 PM, Xiong-can @.***> wrote: After reinforcement learning, the description will be incomplete such as: a motorcycle parked in a parking lot with a ..

—Reply to this email directly, view it on GitHub, or unsubscribe.You are receiving this because you are subscribed to this thread.Message ID: @.***>

Xiong-can commented 10 months ago

确实是会这样的。你看一下bad ending rate,大概多大。如果你用的这个库训练的话,这个rate不会特别大。原因是cider metric的问题。如果你比如说加一个bad ending penalty reward,应该能alleviate这个问题我觉得。Ruotian LuoOn Dec 18, 2023, at 9:26 PM, Xiong-can @.> wrote: After reinforcement learning, the description will be incomplete such as: a motorcycle parked in a parking lot with a .. —Reply to this email directly, view it on GitHub, or unsubscribe.You are receiving this because you are subscribed to this thread.Message ID: @.>

bad ending rate大概是1/2,如何添加bad endiing penalty reward才能alleviate这个问题呢?能请我一下您的模型的这个bad ending rate为多少吗?谢谢!

ruotianluo commented 10 months ago

这个不太对。scst原本的paper会在算cider的时候加上eos token(原本的cider不加)。我之前试的话如果不加eos会有1/3的bad ending。你有follow这样的cider计算方式吗?Ruotian LuoOn Dec 18, 2023, at 10:15 PM, Xiong-can @.***> wrote:

确实是会这样的。你看一下bad ending rate,大概多大。如果你用的这个库训练的话,这个rate不会特别大。原因是cider metric的问题。如果你比如说加一个bad ending penalty reward,应该能alleviate这个问题我觉得。Ruotian LuoOn Dec 18, 2023, at 9:26 PM, Xiong-can @.> wrote: After reinforcement learning, the description will be incomplete such as: a motorcycle parked in a parking lot with a .. —Reply to this email directly, view it on GitHub, or unsubscribe.You are receiving this because you are subscribed to this thread.Message ID: @.>

bad ending rate大概是1/2,如何添加bad endiing penalty reward才能alleviate这个问题呢?能请我一下您的模型的这个bad ending rate为多少吗?谢谢!

—Reply to this email directly, view it on GitHub, or unsubscribe.You are receiving this because you commented.Message ID: @.***>

Xiong-can commented 10 months ago

原本的cider不加)。我之前试的话如果不加eos会有1/3的bad ending。你有follow这样的cider计算方式吗?

我的cider计算方式是M2(meshed memory)论文中的evaluation方法计算cider,强化学习计算分数的时候应该是添加了eos进行计算的,这个是我的强化学习部分主要的代码

Rewards

        caps_gen = text_field.decode(out.view(-1, seq_len))
        caps_gt = list(itertools.chain(*([c, ] * beam_size for c in data["text"])))
        caps_gen, caps_gt = tokenizer_pool.map(evaluation.PTBTokenizer.tokenize, [caps_gen, caps_gt])
        reward = cider.compute_score(caps_gt, caps_gen)[1].astype(np.float32)
        reward = torch.from_numpy(reward).to(device).view(detections.shape[0], beam_size)
        reward_baseline = torch.mean(reward, -1, keepdim=True)
        loss = -torch.mean(log_prob, -1) * (reward - reward_baseline)
        loss = loss.mean()

这个是cider分数的计算方式: def computecider(self): def counts2vec(cnts): """ Function maps counts of ngram to vector of tfidf weights. The function returns vec, an array of dictionary that store mapping of n-gram and tf-idf weights. The n-th entry of array denotes length of n-grams. :param cnts: :return: vec (array of dict), norm (array of float), length (int) """ vec = [defaultdict(float) for in range(self.n)] length = 0 norm = [0.0 for _ in range(self.n)] for (ngram,term_freq) in cnts.items():

give word count 1 if it doesn't appear in reference corpus

            df = np.log(max(1.0, self.doc_frequency[ngram]))
            # ngram index
            n = len(ngram)-1
            # tf (term_freq) * idf (precomputed idf) for n-grams
            vec[n][ngram] = float(term_freq)*(self.ref_len - df)
            # compute norm for the vector.  the norm will be used for computing similarity
            norm[n] += pow(vec[n][ngram], 2)

            if n == 1:
                length += term_freq
        norm = [np.sqrt(n) for n in norm]
        return vec, norm, length

    def sim(vec_hyp, vec_ref, norm_hyp, norm_ref, length_hyp, length_ref):
        '''
        Compute the cosine similarity of two vectors.
        :param vec_hyp: array of dictionary for vector corresponding to hypothesis
        :param vec_ref: array of dictionary for vector corresponding to reference
        :param norm_hyp: array of float for vector corresponding to hypothesis
        :param norm_ref: array of float for vector corresponding to reference
        :param length_hyp: int containing length of hypothesis
        :param length_ref: int containing length of reference
        :return: array of score for each n-grams cosine similarity
        '''
        delta = float(length_hyp - length_ref)
        # measure consine similarity
        val = np.array([0.0 for _ in range(self.n)])
        for n in range(self.n):
            # ngram
            for (ngram,count) in vec_hyp[n].items():
                # vrama91 : added clipping
                val[n] += min(vec_hyp[n][ngram], vec_ref[n][ngram]) * vec_ref[n][ngram]

            if (norm_hyp[n] != 0) and (norm_ref[n] != 0):
                val[n] /= (norm_hyp[n]*norm_ref[n])

            assert(not math.isnan(val[n]))
            # vrama91: added a length based gaussian penalty
            val[n] *= np.e**(-(delta**2)/(2*self.sigma**2))
        return val

    scores = []
    for test, refs in zip(self.ctest, self.crefs):
        # compute vector for test captions
        vec, norm, length = counts2vec(test)
        # compute vector for ref captions
        score = np.array([0.0 for _ in range(self.n)])
        for ref in refs:
            vec_ref, norm_ref, length_ref = counts2vec(ref)
            score += sim(vec, vec_ref, norm, norm_ref, length, length_ref)
        # change by vrama91 - mean of ngram scores, instead of sum
        score_avg = np.mean(score)
        # divide by number of references
        score_avg /= len(refs)
        # multiply score by 10
        score_avg *= 10.0
        # append score of an image to the score list
        scores.append(score_avg)
    return scores

def compute_score(self):
    # compute cider score
    score = self.compute_cider()
    # debug
    # print score
    return np.mean(np.array(score)), np.array(score)
ruotianluo commented 10 months ago

m2的是有问题的。我1/3的结果就是m2跑出来的。我记得m2是没有加eos的。Ruotian LuoOn Dec 19, 2023, at 10:25 AM, Xiong-can @.***> wrote:

原本的cider不加)。我之前试的话如果不加eos会有1/3的bad ending。你有follow这样的cider计算方式吗?

我的cider计算方式是M2(meshed memory)论文中的evaluation方法计算cider,强化学习计算分数的时候应该是添加了eos进行计算的,这个是我的强化学习部分主要的代码 Rewards caps_gen = text_field.decode(out.view(-1, seq_len)) caps_gt = list(itertools.chain(([c, ] beam_size for c in data["text"]))) caps_gen, caps_gt = tokenizer_pool.map(evaluation.PTBTokenizer.tokenize, [caps_gen, caps_gt]) reward = cider.compute_score(caps_gt, caps_gen)[1].astype(np.float32) reward = torch.from_numpy(reward).to(device).view(detections.shape[0], beam_size) reward_baseline = torch.mean(reward, -1, keepdim=True) loss = -torch.mean(log_prob, -1) * (reward - reward_baseline) loss = loss.mean()

这个是cider分数的计算方式: def computecider(self): def counts2vec(cnts): """ Function maps counts of ngram to vector of tfidf weights. The function returns vec, an array of dictionary that store mapping of n-gram and tf-idf weights. The n-th entry of array denotes length of n-grams. :param cnts: :return: vec (array of dict), norm (array of float), length (int) """ vec = [defaultdict(float) for in range(self.n)] length = 0 norm = [0.0 for _ in range(self.n)] for (ngram,term_freq) in cnts.items():

give word count 1 if it doesn't appear in reference corpus

df = np.log(max(1.0, self.doc_frequency[ngram]))

ngram index

n = len(ngram)-1

tf (term_freq) * idf (precomputed idf) for n-grams

vec[n][ngram] = float(term_freq)*(self.ref_len - df)

compute norm for the vector. the norm will be used for computing similarity

norm[n] += pow(vec[n][ngram], 2) if n == 1: length += term_freq norm = [np.sqrt(n) for n in norm] return vec, norm, length

def sim(vec_hyp, vec_ref, norm_hyp, norm_ref, length_hyp, length_ref):
    '''
    Compute the cosine similarity of two vectors.
    :param vec_hyp: array of dictionary for vector corresponding to hypothesis
    :param vec_ref: array of dictionary for vector corresponding to reference
    :param norm_hyp: array of float for vector corresponding to hypothesis
    :param norm_ref: array of float for vector corresponding to reference
    :param length_hyp: int containing length of hypothesis
    :param length_ref: int containing length of reference
    :return: array of score for each n-grams cosine similarity
    '''
    delta = float(length_hyp - length_ref)
    # measure consine similarity
    val = np.array([0.0 for _ in range(self.n)])
    for n in range(self.n):
        # ngram
        for (ngram,count) in vec_hyp[n].items():
            # vrama91 : added clipping
            val[n] += min(vec_hyp[n][ngram], vec_ref[n][ngram]) * vec_ref[n][ngram]

        if (norm_hyp[n] != 0) and (norm_ref[n] != 0):
            val[n] /= (norm_hyp[n]*norm_ref[n])

        assert(not math.isnan(val[n]))
        # vrama91: added a length based gaussian penalty
        val[n] *= np.e**(-(delta**2)/(2*self.sigma**2))
    return val

scores = []
for test, refs in zip(self.ctest, self.crefs):
    # compute vector for test captions
    vec, norm, length = counts2vec(test)
    # compute vector for ref captions
    score = np.array([0.0 for _ in range(self.n)])
    for ref in refs:
        vec_ref, norm_ref, length_ref = counts2vec(ref)
        score += sim(vec, vec_ref, norm, norm_ref, length, length_ref)
    # change by vrama91 - mean of ngram scores, instead of sum
    score_avg = np.mean(score)
    # divide by number of references
    score_avg /= len(refs)
    # multiply score by 10
    score_avg *= 10.0
    # append score of an image to the score list
    scores.append(score_avg)
return scores

def compute_score(self):

compute cider score

score = self.compute_cider()
# debug
# print score
return np.mean(np.array(score)), np.array(score)

—Reply to this email directly, view it on GitHub, or unsubscribe.You are receiving this because you commented.Message ID: @.***>

Xiong-can commented 10 months ago

m2的是有问题的。我1/3的结果就是m2跑出来的。我记得m2是没有加eos的。Ruotian LuoOn Dec 19, 2023, at 10:25 AM, Xiong-can @.> wrote: 原本的cider不加)。我之前试的话如果不加eos会有1/3的bad ending。你有follow这样的cider计算方式吗? 我的cider计算方式是M2(meshed memory)论文中的evaluation方法计算cider,强化学习计算分数的时候应该是添加了eos进行计算的,这个是我的强化学习部分主要的代码 Rewards caps_gen = text_field.decode(out.view(-1, seq_len)) caps_gt = list(itertools.chain(([c, ] beam_size for c in data["text"]))) caps_gen, caps_gt = tokenizer_pool.map(evaluation.PTBTokenizer.tokenize, [caps_gen, caps_gt]) reward = cider.compute_score(caps_gt, caps_gen)[1].astype(np.float32) reward = torch.from_numpy(reward).to(device).view(detections.shape[0], beam_size) reward_baseline = torch.mean(reward, -1, keepdim=True) loss = -torch.mean(log_prob, -1) (reward - reward_baseline) loss = loss.mean() 这个是cider分数的计算方式: def computecider(self): def counts2vec(cnts): """ Function maps counts of ngram to vector of tfidf weights. The function returns vec, an array of dictionary that store mapping of n-gram and tf-idf weights. The n-th entry of array denotes length of n-grams. :param cnts: :return: vec (array of dict), norm (array of float), length (int) """ vec = [defaultdict(float) for in range(self.n)] length = 0 norm = [0.0 for _ in range(self.n)] for (ngram,term_freq) in cnts.items(): # give word count 1 if it doesn't appear in reference corpus df = np.log(max(1.0, self.doc_frequency[ngram])) # ngram index n = len(ngram)-1 # tf (term_freq) idf (precomputed idf) for n-grams vec[n][ngram] = float(term_freq)(self.ref_len - df) # compute norm for the vector. the norm will be used for computing similarity norm[n] += pow(vec[n][ngram], 2) if n == 1: length += term_freq norm = [np.sqrt(n) for n in norm] return vec, norm, length def sim(vec_hyp, vec_ref, norm_hyp, norm_ref, length_hyp, length_ref): ''' Compute the cosine similarity of two vectors. :param vec_hyp: array of dictionary for vector corresponding to hypothesis :param vec_ref: array of dictionary for vector corresponding to reference :param norm_hyp: array of float for vector corresponding to hypothesis :param norm_ref: array of float for vector corresponding to reference :param length_hyp: int containing length of hypothesis :param length_ref: int containing length of reference :return: array of score for each n-grams cosine similarity ''' delta = float(length_hyp - lengthref) # measure consine similarity val = np.array([0.0 for in range(self.n)]) for n in range(self.n): # ngram for (ngram,count) in vec_hyp[n].items(): # vrama91 : added clipping val[n] += min(vec_hyp[n][ngram], vec_ref[n][ngram]) vec_ref[n][ngram] if (norm_hyp[n] != 0) and (norm_ref[n] != 0): val[n] /= (norm_hyp[n]norm_ref[n]) assert(not math.isnan(val[n])) # vrama91: added a length based gaussian penalty val[n] = np.e(-(delta2)/(2self.sigma2)) return val scores = [] for test, refs in zip(self.ctest, self.crefs): # compute vector for test captions vec, norm, length = counts2vec(test) # compute vector for ref captions score = np.array([0.0 for _ in range(self.n)]) for ref in refs: vec_ref, norm_ref, length_ref = counts2vec(ref) score += sim(vec, vec_ref, norm, norm_ref, length, length_ref) # change by vrama91 - mean of ngram scores, instead of sum score_avg = np.mean(score) # divide by number of references score_avg /= len(refs) # multiply score by 10 score_avg *= 10.0 # append score of an image to the score list scores.append(score_avg) return scores def compute_score(self): # compute cider score score = self.compute_cider() # debug # print score return np.mean(np.array(score)), np.array(score) —Reply to this email directly, view it on GitHub, or unsubscribe.You are receiving this because you commented.Message ID: @.***>

那我应该如何alleviate这个问题?我是应该增加一个bad endiing penalty reward?还是修改一下cider的计算方式?是不是Cider-D的计算方式就是增加了一个惩罚因子?

ruotianluo commented 10 months ago

1我没试过reward,但我觉得试一试挺意思的。2 你可以修改m2里面cider的计算方式,加入eos(注意预处理的时候需要加eos,算的时候也要加)3 另一个你可以加上xe loss,blaance一下Ruotian Luo在 2023年12月20日,上午10:49,Xiong-can @.***> 写道:

m2的是有问题的。我1/3的结果就是m2跑出来的。我记得m2是没有加eos的。Ruotian LuoOn Dec 19, 2023, at 10:25 AM, Xiong-can @.> wrote: 原本的cider不加)。我之前试的话如果不加eos会有1/3的bad ending。你有follow这样的cider计算方式吗? 我的cider计算方式是M2(meshed memory)论文中的evaluation方法计算cider,强化学习计算分数的时候应该是添加了eos进行计算的,这个是我的强化学习部分主要的代码 Rewards caps_gen = text_field.decode(out.view(-1, seq_len)) caps_gt = list(itertools.chain(([c, ] beam_size for c in data["text"]))) caps_gen, caps_gt = tokenizer_pool.map(evaluation.PTBTokenizer.tokenize, [caps_gen, caps_gt]) reward = cider.compute_score(caps_gt, caps_gen)[1].astype(np.float32) reward = torch.from_numpy(reward).to(device).view(detections.shape[0], beam_size) reward_baseline = torch.mean(reward, -1, keepdim=True) loss = -torch.mean(log_prob, -1) (reward - reward_baseline) loss = loss.mean() 这个是cider分数的计算方式: def computecider(self): def counts2vec(cnts): """ Function maps counts of ngram to vector of tfidf weights. The function returns vec, an array of dictionary that store mapping of n-gram and tf-idf weights. The n-th entry of array denotes length of n-grams. :param cnts: :return: vec (array of dict), norm (array of float), length (int) """ vec = [defaultdict(float) for in range(self.n)] length = 0 norm = [0.0 for _ in range(self.n)] for (ngram,term_freq) in cnts.items(): # give word count 1 if it doesn't appear in reference corpus df = np.log(max(1.0, self.doc_frequency[ngram])) # ngram index n = len(ngram)-1 # tf (term_freq) idf (precomputed idf) for n-grams vec[n][ngram] = float(term_freq)(self.ref_len - df) # compute norm for the vector. the norm will be used for computing similarity norm[n] += pow(vec[n][ngram], 2) if n == 1: length += term_freq norm = [np.sqrt(n) for n in norm] return vec, norm, length def sim(vec_hyp, vec_ref, norm_hyp, norm_ref, length_hyp, length_ref): ''' Compute the cosine similarity of two vectors. :param vec_hyp: array of dictionary for vector corresponding to hypothesis :param vec_ref: array of dictionary for vector corresponding to reference :param norm_hyp: array of float for vector corresponding to hypothesis :param norm_ref: array of float for vector corresponding to reference :param length_hyp: int containing length of hypothesis :param length_ref: int containing length of reference :return: array of score for each n-grams cosine similarity ''' delta = float(length_hyp - lengthref) # measure consine similarity val = np.array([0.0 for in range(self.n)]) for n in range(self.n): # ngram for (ngram,count) in vec_hyp[n].items(): # vrama91 : added clipping val[n] += min(vec_hyp[n][ngram], vec_ref[n][ngram]) vec_ref[n][ngram] if (norm_hyp[n] != 0) and (norm_ref[n] != 0): val[n] /= (norm_hyp[n]normref[n]) assert(not math.isnan(val[n])) # vrama91: added a length based gaussian penalty val[n] = np.e(-(delta2)/(2*self.sigma2)) return val scores = [] for test, refs in zip(self.ctest, self.crefs): # compute vector for test captions vec, norm, length = counts2vec(test) # compute vector for ref captions score = np.array([0.0 for in range(self.n)]) for ref in refs: vec_ref, norm_ref, length_ref = counts2vec(ref) score += sim(vec, vec_ref, norm, norm_ref, length, length_ref) # change by vrama91 - mean of ngram scores, instead of sum score_avg = np.mean(score) # divide by number of references score_avg /= len(refs) # multiply score by 10 score_avg = 10.0 # append score of an image to the score list scores.append(score_avg) return scores def compute_score(self): # compute cider score score = self.compute_cider() # debug # print score return np.mean(np.array(score)), np.array(score) —Reply to this email directly, view it on GitHub, or unsubscribe.You are receiving this because you commented.Message ID: @.*>

那我应该如何alleviate这个问题?我是应该增加一个bad endiing penalty reward?还是修改一下cider的计算方式?是不是Cider-D的计算方式就是增加了一个惩罚因子?

—Reply to this email directly, view it on GitHub, or unsubscribe.You are receiving this because you commented.Message ID: @.***>