gpxhyfzqz commented 4 months ago

When I'm training, I get such a reward curve, is it normal? The code is as follows, and it comes from https://github.com/weimingboya/DFT/blob/main/common/utils/utils.py def train_scst(model, dataloader, optim, cider, text_field, epoch, device = 'cuda', scheduler = None, args=None):

Training with self-critical

model.train()
lr = optim.state_dict()['param_groups'][0]['lr']

tokenizer_pool = multiprocessing.Pool()
running_reward = .0
running_reward_baseline = .0
running_loss = .0
seq_len = 20
beam_size = 5

with tqdm(desc='Epoch %d - train' % epoch, unit='it', total=len(dataloader)) as pbar:
    for it, (features, caps_gt) in enumerate(dataloader):
        regions = features[0].to(device)
        grids = features[1].to(device)
        outs, log_probs = model.beam_search(regions, seq_len, text_field.vocab.stoi['<eos>'],
                                            beam_size, out_size=beam_size,
                                            **{'grid': grids})
        optim.zero_grad()

        # Rewards
        caps_gen = text_field.decode(outs.view(-1, seq_len))
        caps_gt = list(itertools.chain(*([c, ] * beam_size for c in caps_gt)))
        caps_gen, caps_gt = tokenizer_pool.map(evaluation.PTBTokenizer.tokenize, [caps_gen, caps_gt])
        reward = cider.compute_score(caps_gt, caps_gen)[1].astype(np.float32)
        reward = torch.from_numpy(reward).to(device).view(regions.shape[0], beam_size)
        reward_baseline = torch.mean(reward, -1, keepdim=True)
        loss = -torch.mean(log_probs, -1) * (reward - reward_baseline)

        loss = loss.mean()
        loss.backward()
        optim.step()

        if scheduler is not None:
            scheduler.step()

        running_loss += loss.item()
        running_reward += reward.mean().item()
        running_reward_baseline += reward_baseline.mean().item()
        pbar.set_postfix(loss=running_loss / (it + 1), reward=running_reward / (it + 1), lr=lr)
        pbar.update()

loss = running_loss / len(dataloader)
reward = running_reward / len(dataloader)
reward_baseline = running_reward_baseline / len(dataloader)
return loss, reward, reward_baseline

ruotianluo commented 4 months ago

That is a bit wierd. Do you also see such behavior in loss or (reward - reward_baseline)?

gpxhyfzqz commented 4 months ago

Thanks for the reply, This is the first time I've noticed this issue, and my curves were normal before。 This is my training loss. This is CIDEr scores in test。

ruotianluo commented 4 months ago

W B Chart 6_11_2024, 10_10_38 PM Your train_loss is a little different from mine.

gpxhyfzqz commented 4 months ago

The loss curve of my other experiment is also different from yours, but his CIDEr score curve looks normal. I'm revisiting my code, looking for a problem. Maybe it's a matter of train loss in SCST. If I find the problem, I'll reply later.

gpxhyfzqz commented 4 months ago

I also have a problem, in my previous experiments, when my batch_size settings were larger, I tended to get much lower CIDEr scores than when the batch_size was smaller. Maybe a bigger batch_size will affect the training? Thank you most sincerely for your answer.

ruotianluo commented 4 months ago

Can you try not using beam search see if that changes? also, please follow https://www.iris.unimore.it/retrieve/95abce70-9754-4d80-8a1e-63962f3803dc/2305.12254.pdf. M2 does not add eos which is wrong.

gpxhyfzqz commented 4 months ago

This is a problem with my code. UserWarning: Detected call of lr_scheduler.step() before optimizer.step(). In PyTorch 1 .1.0 and later, you should call them in the opposite order: optimizer.step() before lr_scheduler.step(). Failure to do this will result in PyTorch skipping the first value of th e learning rate schedule. See more details at https://pytorch.org/docs/stable/optim.html#how-to-adjust-learning-rate. Because of this warning, I put scheduler.step() here. Now, I modified it and the curve looks normal.

gpxhyfzqz commented 4 months ago

Thank you for your help!!

ruotianluo / self-critical.pytorch

Is the reward curve normal? #287

Training with self-critical