Closed wafaAlghallabi closed 5 months ago
I have precisely the same issue. The training starts if no checkpoints are provided, but I get the same error when loading the checkpoint.
Hey @wafaAlghallabi. The error was coming from trainer.py, line 504. I was able to solve it by using get method instead of directly accessing the key.
# load model parameters
try:
use_ema = self.cfg.checkpoint.get("use_ema_weights_to_init_param", False)
if use_ema and "extra_state" in state and "ema" in state["extra_state"]:
Hi, @wafaAlghallabi @Gollini ,thank you for your interest in my work. I apologize for the delayed response. As soon as I conclude my current pressing tasks, I'll address and resolve the issue you've mentioned. I appreciate your patience and understanding.
Hello again,
Thank you for your time and help.
I'm trying to run 'train_medmnist.sh' script for image classification task but I have an issue in loading the checkpoints provided in the repository
biomedgpt_base.pt
. It shows me architectures mishmatch..Traceback (most recent call last): File "../../train.py", line 537, in <module> cli_main() File "../../train.py", line 530, in cli_main distributed_utils.call_main(cfg, main) File "/../.conda/envs/biomedgpt/lib/python3.7/site-packages/fairseq/distributed/utils.py", line 369, in call_main main(cfg, **kwargs) File "../../train.py", line 170, in main disable_iterator_cache=False, File "/../utils/checkpoint_utils.py", line 254, in load_checkpoint reset_meters=reset_meters, File "/../../trainer.py", line 526, in load_checkpoint "please ensure that the architectures match.".format(filename) Exception: Cannot load model parameters from checkpoint /../checkpoints/biomedgpt_base.pt; please ensure that the architectures match.
May I ask you what could be the issue behind this error, please?