Open cxmscb opened 1 year ago
Hi, if I remember correctly, it is because the checkpoints also contain the state_dict of the optimizer for different checkpoints. Thus, if the training of the model trained on VIDIT was longer, the checkpoint lisrd_vidit.pth will be bigger.
However, the 'model_state_dict' should have the same size between both checkpoints (and this is the only useful data when using the pre-trained LISRD, the optimizer state dict could be safely ignored).
Thank you very much for the code and models. Could you explain why the size of the model lisrd_aachen.pth is smaller than that of the model lisrd_vidit.pth?