viig99 / LS-ACELoss

Label smoothed Aggregation cross entropy loss for generalisation in sequence to sequence tasks.
MIT License
14 stars 3 forks source link

accuracy about LS-ACELoss #1

Open bjlgcxc opened 4 years ago

bjlgcxc commented 4 years ago

I use the original ACELoss to train CRNN based on synth90k dataset, but can not converage. I wonder if LS-ACELoss been used in such training as I mention? And what is the accuracy

viig99 commented 4 years ago

ACELoss mostly works well when network architecture is CNN variants i tried with efficient-net variant, also when the final time dimension is pretty small i.e <100 timesteps, i tried it on few different datasets like coco, synth-text, icdar, some custom datasets etc and was able to reach 96.5~% character error rate using ACELoss, maybe in comparison to 96~% ctc loss. Training was much faster though using ACELoss and was able to give proper confidence intervals on the mistakes it made, thus having better generalization.

bjlgcxc commented 4 years ago

Thank you for your replay. I have trained many times using crnn,and the dataset is synth90k, the loss is ACELoss as you provide. I use 8 gpus and the batch size per gpu is 64, the initial lr is 0.001, the optimizer is Adam. Through the loss decline,the accuracy is near to zero. Do you konw why?

And when testing, how do you get the final predict result. Is that like this?

def decode_batch(self):
        out_best = torch.max(self.softmax, 2)[1].data.cpu().numpy()
        pre_result = [0]*self.bs
        for j in range(self.bs):
            pre_result[j] = out_best[j][out_best[j]!=0]
        return pre_result
viig99 commented 4 years ago

for j in range(self.bs): for i in range(len(out_best[j])): out_best[j][i] != 0 and (not (i > 0 and out_best[j][i - 1] == out_best[j][i])) #something on this lines

the paper results also on CNN-LSTM variants so should work well on that architecture, not sure why your accuracy is near zero, for the initial epochs aceloss learns slowly, but starts converging within 10-20 epochs

Perhaps you can use ctc loss to begin with to check if rest of the code works fine, and then switch to aceloss

bjlgcxc commented 4 years ago

yeah, the code I use works well with CTC Loss

And is the code below suitable for CTC as well as ACE?

   for j in range(self.bs):
        for i in range(len(out_best[j])): 
              out_best[j][i] != 0 and (not (i > 0 and out_best[j][i - 1] == out_best[j][i])) #something on this lines
viig99 commented 4 years ago

Yes, essentially its pretty much doing the same thing, drop the time frame with 0 (blank class), and skip the classes which occur continuously without blank chars (repetitions), not sure why its not working otherwise. code provided might not exactly work, but have included for readability over pseudo-code

bjlgcxc commented 4 years ago

OK, I know the code, for I use the same code as you for CTC. But when I test for ACE model, I use the code below.I don't know if the decode code matters much.

        def decode_batch(self):
            out_best = torch.max(self.softmax, 2)[1].data.cpu().numpy()
            pre_result = [0]*self.bs
            for j in range(self.bs):
                 pre_result[j] = out_best[j][out_best[j]!=0]
            return pre_result
viig99 commented 4 years ago

You are right, shouldn't matter much, https://github.com/summerlvsong/Aggregation-Cross-Entropy/blob/master/source/models/seq_module.py#L52, the code is pretty much is line with https://github.com/summerlvsong/Aggregation-Cross-Entropy/blob/master/source/models/seq_module.py#L28 with added modifications for label smoothing.

viig99 commented 4 years ago

Also to note i tried both,

        targets_padded = targets_padded/T_
        loss2 = (-torch.sum(torch.log(probs) * targets_padded)) / bs
        return loss2
        targets_padded = targets_padded/T_
        targets_padded = F.normalize(targets_padded, p=1, dim=1)
        return F.kl_div(torch.log(probs), targets_padded, reduction='batchmean')

and they both worked well for me

bjlgcxc commented 4 years ago

Also to note i tried both,

        targets_padded = targets_padded/T_
        loss2 = (-torch.sum(torch.log(probs) * targets_padded)) / bs
        return loss2
        targets_padded = targets_padded/T_
        targets_padded = F.normalize(targets_padded, p=1, dim=1)
        return F.kl_div(torch.log(probs), targets_padded, reduction='batchmean')

and they both worked well for me

OK, good information, I will try again

bjlgcxc commented 4 years ago

@viig99 I tried many times again but the problem still remains. Could you please release some more codes of your training with ACE Loss? Or maybe share me some simple codes, very thanks.

viig99 commented 4 years ago

Ah sadly its commercial code, so i cant release it, here are some relavent pieces though similar to the CRNN codebase,

t is the sequence tensor, right padding with 0, and l is the length tensor of all the sequences in the batch, exactly same as how pytorch CTC loss needs it as the input.

text = torch.IntTensor(opt.batchSize * 5)
length = torch.IntTensor(opt.batchSize)
criterion = ACELabelSmoothingLoss()

def loadData(v, data):
    v.resize_as_(data).copy_(data, non_blocking=False)

def LossFunction(y_pred, y):
    y_pred = y_pred.permute(1, 0, 2).contiguous()  # time, Batch, n_vocab
    t, l = converter.encode(y)
    loadData(text, t)
    loadData(length, l)
    preds_size = torch.IntTensor([y_pred.size(0)] * y_pred.size(1))
    return criterion(y_pred.to(device), text, preds_size, length)

This is how the loss is backpropagated

        x, y = batch
        y_pred = model(x)
        loss = LossFunction(y_pred, y)
        loss.backward()

This is how i am calculating exact accuracy, accuracy class is from pytorch-ignite, and essentially just calculates _num_correct / _num_examples

     class ExactAccuracy(Accuracy):

        @torch.no_grad()
        def update(self, output):
            y_pred, y = output
            y_pred = y_pred.permute(1, 0, 2)
            preds_size = torch.IntTensor([y_pred.size(0)] * y_pred.size(1))
            _, preds = y_pred.max(2)
            preds = preds.transpose(1, 0).contiguous().view(-1)
            sim_preds = converter.decode(
                preds.tolist(), preds_size.tolist(), raw=False)
            batch_edit_accuracy = 0
            for pred, target in zip(sim_preds, y):
                correct = 1 if pred == target else 0
                batch_edit_accuracy += correct
            self._num_correct += batch_edit_accuracy
            self._num_examples += y_pred.size(1)

where converter.encode is essentially same as https://github.com/meijieru/crnn.pytorch/blob/master/utils.py#L32,

Hope this shed some more light and helps you.

bjlgcxc commented 4 years ago

appreciate it very much!

bjlgcxc commented 4 years ago

@viig99 I have push the code based on crnn.pytorch -> https://github.com/bjlgcxc/CRNN_ACE_Loss, the training of this code also can not converage when using ace loss. Could you help me to find something may cause this problem? The train data was create by the code in data folder, which based on 1000 samples extract from synth90k dataset. When I use ctc loss to train, the model can coverage and the validation accuracy is > 0.6. When use ace loss, the validation accuracy is 0 all the time. I trained ace loss by adadelta, 1 gpu, lr=0.01 and 200 epoches