Open bjlgcxc opened 4 years ago
ACELoss mostly works well when network architecture is CNN variants i tried with efficient-net variant, also when the final time dimension is pretty small i.e <100 timesteps, i tried it on few different datasets like coco, synth-text, icdar, some custom datasets etc and was able to reach 96.5~% character error rate using ACELoss, maybe in comparison to 96~% ctc loss. Training was much faster though using ACELoss and was able to give proper confidence intervals on the mistakes it made, thus having better generalization.
Thank you for your replay. I have trained many times using crnn,and the dataset is synth90k, the loss is ACELoss as you provide. I use 8 gpus and the batch size per gpu is 64, the initial lr is 0.001, the optimizer is Adam. Through the loss decline,the accuracy is near to zero. Do you konw why?
And when testing, how do you get the final predict result. Is that like this?
def decode_batch(self):
out_best = torch.max(self.softmax, 2)[1].data.cpu().numpy()
pre_result = [0]*self.bs
for j in range(self.bs):
pre_result[j] = out_best[j][out_best[j]!=0]
return pre_result
for j in range(self.bs): for i in range(len(out_best[j])): out_best[j][i] != 0 and (not (i > 0 and out_best[j][i - 1] == out_best[j][i])) #something on this lines
the paper results also on CNN-LSTM variants so should work well on that architecture, not sure why your accuracy is near zero, for the initial epochs aceloss learns slowly, but starts converging within 10-20 epochs
Perhaps you can use ctc loss to begin with to check if rest of the code works fine, and then switch to aceloss
yeah, the code I use works well with CTC Loss
And is the code below suitable for CTC as well as ACE?
for j in range(self.bs):
for i in range(len(out_best[j])):
out_best[j][i] != 0 and (not (i > 0 and out_best[j][i - 1] == out_best[j][i])) #something on this lines
Yes, essentially its pretty much doing the same thing, drop the time frame with 0 (blank class), and skip the classes which occur continuously without blank chars (repetitions), not sure why its not working otherwise. code provided might not exactly work, but have included for readability over pseudo-code
OK, I know the code, for I use the same code as you for CTC. But when I test for ACE model, I use the code below.I don't know if the decode code matters much.
def decode_batch(self):
out_best = torch.max(self.softmax, 2)[1].data.cpu().numpy()
pre_result = [0]*self.bs
for j in range(self.bs):
pre_result[j] = out_best[j][out_best[j]!=0]
return pre_result
You are right, shouldn't matter much, https://github.com/summerlvsong/Aggregation-Cross-Entropy/blob/master/source/models/seq_module.py#L52, the code is pretty much is line with https://github.com/summerlvsong/Aggregation-Cross-Entropy/blob/master/source/models/seq_module.py#L28 with added modifications for label smoothing.
Also to note i tried both,
targets_padded = targets_padded/T_
loss2 = (-torch.sum(torch.log(probs) * targets_padded)) / bs
return loss2
targets_padded = targets_padded/T_
targets_padded = F.normalize(targets_padded, p=1, dim=1)
return F.kl_div(torch.log(probs), targets_padded, reduction='batchmean')
and they both worked well for me
Also to note i tried both,
targets_padded = targets_padded/T_ loss2 = (-torch.sum(torch.log(probs) * targets_padded)) / bs return loss2
targets_padded = targets_padded/T_ targets_padded = F.normalize(targets_padded, p=1, dim=1) return F.kl_div(torch.log(probs), targets_padded, reduction='batchmean')
and they both worked well for me
OK, good information, I will try again
@viig99 I tried many times again but the problem still remains. Could you please release some more codes of your training with ACE Loss? Or maybe share me some simple codes, very thanks.
Ah sadly its commercial code, so i cant release it, here are some relavent pieces though similar to the CRNN codebase,
t is the sequence tensor, right padding with 0, and l is the length tensor of all the sequences in the batch, exactly same as how pytorch CTC loss needs it as the input.
text = torch.IntTensor(opt.batchSize * 5)
length = torch.IntTensor(opt.batchSize)
criterion = ACELabelSmoothingLoss()
def loadData(v, data):
v.resize_as_(data).copy_(data, non_blocking=False)
def LossFunction(y_pred, y):
y_pred = y_pred.permute(1, 0, 2).contiguous() # time, Batch, n_vocab
t, l = converter.encode(y)
loadData(text, t)
loadData(length, l)
preds_size = torch.IntTensor([y_pred.size(0)] * y_pred.size(1))
return criterion(y_pred.to(device), text, preds_size, length)
This is how the loss is backpropagated
x, y = batch
y_pred = model(x)
loss = LossFunction(y_pred, y)
loss.backward()
This is how i am calculating exact accuracy, accuracy class is from pytorch-ignite, and essentially just calculates _num_correct / _num_examples
class ExactAccuracy(Accuracy):
@torch.no_grad()
def update(self, output):
y_pred, y = output
y_pred = y_pred.permute(1, 0, 2)
preds_size = torch.IntTensor([y_pred.size(0)] * y_pred.size(1))
_, preds = y_pred.max(2)
preds = preds.transpose(1, 0).contiguous().view(-1)
sim_preds = converter.decode(
preds.tolist(), preds_size.tolist(), raw=False)
batch_edit_accuracy = 0
for pred, target in zip(sim_preds, y):
correct = 1 if pred == target else 0
batch_edit_accuracy += correct
self._num_correct += batch_edit_accuracy
self._num_examples += y_pred.size(1)
where converter.encode is essentially same as https://github.com/meijieru/crnn.pytorch/blob/master/utils.py#L32,
Hope this shed some more light and helps you.
appreciate it very much!
@viig99 I have push the code based on crnn.pytorch -> https://github.com/bjlgcxc/CRNN_ACE_Loss, the training of this code also can not converage when using ace loss. Could you help me to find something may cause this problem? The train data was create by the code in data folder, which based on 1000 samples extract from synth90k dataset. When I use ctc loss to train, the model can coverage and the validation accuracy is > 0.6. When use ace loss, the validation accuracy is 0 all the time. I trained ace loss by adadelta, 1 gpu, lr=0.01 and 200 epoches
I use the original ACELoss to train CRNN based on synth90k dataset, but can not converage. I wonder if LS-ACELoss been used in such training as I mention? And what is the accuracy