Error while training from checkpoint.

BvSaiAkhil commented 3 years ago

Hi there. I have trained the model using ICDAR data and while using the checkpoint to train it on my own data with a different number of entities I have been getting the following error:

RuntimeError: Error(s) in loading state_dict for PICKModel: size mismatch for decoder.bilstm_layer.mlp.mlp.0.weight: copying a param with shape torch.Size([11, 1024]) from checkpoint, the shape in current model is torch.Size([15, 1024]). size mismatch for decoder.bilstm_layer.mlp.mlp.0.bias: copying a param with shape torch.Size([11]) from checkpoint, the shape in current model is torch.Size([15]). size mismatch for decoder.crf_layer.transitions: copying a param with shape torch.Size([11, 11]) from checkpoint, the shape in current model is torch.Size([15, 15]). size mismatch for decoder.crf_layer._constraint_mask: copying a param with shape torch.Size([13, 13]) from checkpoint, the shape in current model is torch.Size([17, 17]). size mismatch for decoder.crf_layer.start_transitions: copying a param with shape torch.Size([11]) from checkpoint, the shape in current model is torch.Size([15]). size mismatch for decoder.crf_layer.end_transitions: copying a param with shape torch.Size([11]) from checkpoint, the shape in current model is torch.Size([15]).

Does it mean that we can't train from the checkpoint? Is there any other way to train if from checkpoints using data with a different number of entities? Thanks.

WenshuangSong commented 3 years ago

Hi，have you solve it?

WenshuangSong commented 3 years ago

Hi there. I have trained the model using ICDAR data and while using the checkpoint to train it on my own data with a different number of entities I have been getting the following error:

RuntimeError: Error(s) in loading state_dict for PICKModel: size mismatch for decoder.bilstm_layer.mlp.mlp.0.weight: copying a param with shape torch.Size([11, 1024]) from checkpoint, the shape in current model is torch.Size([15, 1024]). size mismatch for decoder.bilstm_layer.mlp.mlp.0.bias: copying a param with shape torch.Size([11]) from checkpoint, the shape in current model is torch.Size([15]). size mismatch for decoder.crf_layer.transitions: copying a param with shape torch.Size([11, 11]) from checkpoint, the shape in current model is torch.Size([15, 15]). size mismatch for decoder.crf_layer._constraint_mask: copying a param with shape torch.Size([13, 13]) from checkpoint, the shape in current model is torch.Size([17, 17]). size mismatch for decoder.crf_layer.start_transitions: copying a param with shape torch.Size([11]) from checkpoint, the shape in current model is torch.Size([15]). size mismatch for decoder.crf_layer.end_transitions: copying a param with shape torch.Size([11]) from checkpoint, the shape in current model is torch.Size([15]).

Does it mean that we can't train from the checkpoint? Is there any other way to train if from checkpoints using data with a different number of entities? Thanks.

Hi,have you save it?

ndcuong91 commented 3 years ago

Maybe the number of classes was changed

compadrejavo commented 3 years ago

I encountered the same error, to solve it you must go to PICK-pytorch/utils/entities_list.py and edit that list with the classes.

wenwenyu / PICK-pytorch

Error while training from checkpoint. #53