Open aaaaaannie opened 2 months ago
Could you let me know which scripts and checkpoints you are using?
Could you let me know which scripts and checkpoints you are using?
For the scripts, I used 'pretrain_tiny.sh' located in the scripts/pretrain/ . I only made two modifications to this script:
For the checkpoint, I used 'biomedgpt_tiny.pt' which I downloaded from the Dropbox link provided in the checkpoints.md (https://www.dropbox.com/sh/cu2r5zkj2r0e6zu/AADZ-KHn-emsICawm9CM4MqVa?dl=0).
These were the key components I utilized for my setup. Let me know if you need any clarification or have additional questions about the configuration.
Could you try installing Fairseq from this repository instead of OFA and re-run the code? Additionally, could you please share the entire error log?
hi i've install fairseq from your repository, but still get the same error as below:
2024-09-23 02:08:07 - train.py[line:154] - INFO: training on 4 devices (GPUs/TPUs) 2024-09-23 02:08:07 - train.py[line:160] - INFO: max tokens per device = None and max sentences per device = 16 2024-09-23 02:08:07 - trainer.py[line:458] - INFO: Preparing to load checkpoint ../../scripts/biomedgpt_tiny.pt Traceback (most recent call last): File "/mypath/trainer.py", line 519, in load_checkpoint Traceback (most recent call last): File "/mypath/trainer.py", line 519, in load_checkpoint state["model"], strict=True, model_cfg=self.cfg.model state["model"], strict=True, model_cfg=self.cfg.model
File "/mypath/fairseq/fairseq/distributed/module_proxy_wrapper.py", line 52, in load_state_dict File "/mypath/fairseq/fairseq/distributed/module_proxy_wrapper.py", line 52, in load_state_dict return self.module.module.load_state_dict(*args, kwargs) File "/mypath/fairseq/fairseq/models/fairseq_model.py", line 125, in load_state_dict return self.module.module.load_state_dict(*args, *kwargs) File "/mypath/fairseq/fairseq/models/fairseq_model.py", line 125, in load_state_dict Traceback (most recent call last): File "/mypath/trainer.py", line 519, in load_checkpoint state["model"], strict=True, model_cfg=self.cfg.model File "/mypath/fairseq/fairseq/distributed/module_proxy_wrapper.py", line 52, in load_state_dict return self.module.module.load_state_dict(args, kwargs) File "/mypath/fairseq/fairseq/models/fairseq_model.py", line 125, in load_state_dict Traceback (most recent call last): File "/mypath/trainer.py", line 519, in load_checkpoint state["model"], strict=True, model_cfg=self.cfg.model File "/mypath/fairseq/fairseq/distributed/module_proxy_wrapper.py", line 52, in load_state_dict return self.module.module.load_state_dict(*args, **kwargs) File "/mypath/fairseq/fairseq/models/fairseq_model.py", line 125, in load_state_dict return super().load_state_dict(new_state_dict, strict)return super().load_state_dict(new_state_dict, strict)
File "/root/miniconda3/envs/biomedgpt/lib/python3.7/site-packages/torch/nn/modules/module.py", line 1668, in load_state_dict
return super().load_state_dict(new_state_dict, strict) File "/root/miniconda3/envs/biomedgpt/lib/python3.7/site-packages/torch/nn/modules/module.py", line 1668, in load_state_dict return super().load_state_dict(new_state_dict, strict) File "/root/miniconda3/envs/biomedgpt/lib/python3.7/site-packages/torch/nn/modules/module.py", line 1668, in load_state_dict
File "/root/miniconda3/envs/biomedgpt/lib/python3.7/site-packages/torch/nn/modules/module.py", line 1668, in load_state_dict self.class.name, "\n\t".join(error_msgs))) self.class.name, "\n\t".join(error_msgs))) RuntimeErrorRuntimeError: Error(s) in loading state_dict for OFAModel: Unexpected key(s) in state_dict: "encoder.layers.0.attn_ln.weight", "encoder.layers.0.attn_ln.bias", "encoder.layers.0.ffn_layernorm.weight", "encoder.layers.0.ffn_layernorm.bias", "encoder.layers.0.self_attn.c_attn", "encoder.layers.1.attn_ln.weight", "encoder.layers.1.attn_ln.bias", "encoder.layers.1.ffn_layernorm.weight", "encoder.layers.1.ffn_layernorm.bias", "encoder.layers.1.self_attn.c_attn", "encoder.layers.2.attn_ln.weight", "encoder.layers.2.attn_ln.bias", "encoder.layers.2.ffn_layernorm.weight", "encoder.layers.2.ffn_layernorm.bias", "encoder.layers.2.self_attn.c_attn", "encoder.layers.3.attn_ln.weight", "encoder.layers.3.attn_ln.bias", "encoder.layers.3.ffn_layernorm.weight", "encoder.layers.3.ffn_layernorm.bias", "encoder.layers.3.self_attn.c_attn", "decoder.layers.0.self_attn_ln.weight", "decoder.layers.0.self_attn_ln.bias", "decoder.layers.0.cross_attn_ln.weight", "decoder.layers.0.cross_attn_ln.bias", "decoder.layers.0.ffn_layernorm.weight", "decoder.layers.0.ffn_layernorm.bias", "decoder.layers.0.self_attn.c_attn", "decoder.layers.0.encoder_attn.c_attn", "decoder.layers.1.self_attn_ln.weight", "decoder.layers.1.self_attn_ln.bias", "decoder.layers.1.cross_attn_ln.weight", "decoder.layers.1.cross_attn_ln.bias", "decoder.layers.1.ffn_layernorm.weight", "decoder.layers.1.ffn_layernorm.bias", "decoder.layers.1.self_attn.c_attn", "decoder.layers.1.encoder_attn.c_attn", "decoder.layers.2.self_attn_ln.weight", "decoder.layers.2.self_attn_ln.bias", "decoder.layers.2.cross_attn_ln.weight", "decoder.layers.2.cross_attn_ln.bias", "decoder.layers.2.ffn_layernorm.weight", "decoder.layers.2.ffn_layernorm.bias", "decoder.layers.2.self_attn.c_attn", "decoder.layers.2.encoder_attn.c_attn", "decoder.layers.3.self_attn_ln.weight", "decoder.layers.3.self_attn_ln.bias", "decoder.layers.3.cross_attn_ln.weight", "decoder.layers.3.cross_attn_ln.bias", "decoder.layers.3.ffn_layernorm.weight", "decoder.layers.3.ffn_layernorm.bias", "decoder.layers.3.self_attn.c_attn", "decoder.layers.3.encoder_attn.c_attn". :
During handling of the above exception, another exception occurred:
Error(s) in loading state_dict for OFAModel: Unexpected key(s) in state_dict: "encoder.layers.0.attn_ln.weight", "encoder.layers.0.attn_ln.bias", "encoder.layers.0.ffn_layernorm.weight", "encoder.layers.0.ffn_layernorm.bias", "encoder.layers.0.self_attn.c_attn", "encoder.layers.1.attn_ln.weight", "encoder.layers.1.attn_ln.bias", "encoder.layers.1.ffn_layernorm.weight", "encoder.layers.1.ffn_layernorm.bias", "encoder.layers.1.self_attn.c_attn", "encoder.layers.2.attn_ln.weight", "encoder.layers.2.attn_ln.bias", "encoder.layers.2.ffn_layernorm.weight", "encoder.layers.2.ffn_layernorm.bias", "encoder.layers.2.self_attn.c_attn", "encoder.layers.3.attn_ln.weight", "encoder.layers.3.attn_ln.bias", "encoder.layers.3.ffn_layernorm.weight", "encoder.layers.3.ffn_layernorm.bias", "encoder.layers.3.self_attn.c_attn", "decoder.layers.0.self_attn_ln.weight", "decoder.layers.0.self_attn_ln.bias", "decoder.layers.0.cross_attn_ln.weight", "decoder.layers.0.cross_attn_ln.bias", "decoder.layers.0.ffn_layernorm.weight", "decoder.layers.0.ffn_layernorm.bias", "decoder.layers.0.self_attn.c_attn", "decoder.layers.0.encoder_attn.c_attn", "decoder.layers.1.self_attn_ln.weight", "decoder.layers.1.self_attn_ln.bias", "decoder.layers.1.cross_attn_ln.weight", "decoder.layers.1.cross_attn_ln.bias", "decoder.layers.1.ffn_layernorm.weight", "decoder.layers.1.ffn_layernorm.bias", "decoder.layers.1.self_attn.c_attn", "decoder.layers.1.encoder_attn.c_attn", "decoder.layers.2.self_attn_ln.weight", "decoder.layers.2.self_attn_ln.bias", "decoder.layers.2.cross_attn_ln.weight", "decoder.layers.2.cross_attn_ln.bias", "decoder.layers.2.ffn_layernorm.weight", "decoder.layers.2.ffn_layernorm.bias", "decoder.layers.2.self_attn.c_attn", "decoder.layers.2.encoder_attn.c_attn", "decoder.layers.3.self_attn_ln.weight", "decoder.layers.3.self_attn_ln.bias", "decoder.layers.3.cross_attn_ln.weight", "decoder.layers.3.cross_attn_ln.bias", "decoder.layers.3.ffn_layernorm.weight", "decoder.layers.3.ffn_layernorm.bias", "decoder.layers.3.self_attn.c_attn", "decoder.layers.3.encoder_attn.c_attn".
During handling of the above exception, another exception occurred:
self.__class__.__name__, "\n\t".join(error_msgs)))Traceback (most recent call last):
self.class.name, "\n\t".join(error_msgs))) File "../../train.py", line 537, in
Traceback (most recent call last):
RuntimeError File "../../train.py", line 537, in
Error(s) in loading state_dict for OFAModel: Unexpected key(s) in state_dict: "encoder.layers.0.attn_ln.weight", "encoder.layers.0.attn_ln.bias", "encoder.layers.0.ffn_layernorm.weight", "encoder.layers.0.ffn_layernorm.bias", "encoder.layers.0.self_attn.c_attn", "encoder.layers.1.attn_ln.weight", "encoder.layers.1.attn_ln.bias", "encoder.layers.1.ffn_layernorm.weight", "encoder.layers.1.ffn_layernorm.bias", "encoder.layers.1.self_attn.c_attn", "encoder.layers.2.attn_ln.weight", "encoder.layers.2.attn_ln.bias", "encoder.layers.2.ffn_layernorm.weight", "encoder.layers.2.ffn_layernorm.bias", "encoder.layers.2.self_attn.c_attn", "encoder.layers.3.attn_ln.weight", "encoder.layers.3.attn_ln.bias", "encoder.layers.3.ffn_layernorm.weight", "encoder.layers.3.ffn_layernorm.bias", "encoder.layers.3.self_attn.c_attn", "decoder.layers.0.self_attn_ln.weight", "decoder.layers.0.self_attn_ln.bias", "decoder.layers.0.cross_attn_ln.weight", "decoder.layers.0.cross_attn_ln.bias", "decoder.layers.0.ffn_layernorm.weight", "decoder.layers.0.ffn_layernorm.bias", "decoder.layers.0.self_attn.c_attn", "decoder.layers.0.encoder_attn.c_attn", "decoder.layers.1.self_attn_ln.weight", "decoder.layers.1.self_attn_ln.bias", "decoder.layers.1.cross_attn_ln.weight", "decoder.layers.1.cross_attn_ln.bias", "decoder.layers.1.ffn_layernorm.weight", "decoder.layers.1.ffn_layernorm.bias", "decoder.layers.1.self_attn.c_attn", "decoder.layers.1.encoder_attn.c_attn", "decoder.layers.2.self_attn_ln.weight", "decoder.layers.2.self_attn_ln.bias", "decoder.layers.2.cross_attn_ln.weight", "decoder.layers.2.cross_attn_ln.bias", "decoder.layers.2.ffn_layernorm.weight", "decoder.layers.2.ffn_layernorm.bias", "decoder.layers.2.self_attn.c_attn", "decoder.layers.2.encoder_attn.c_attn", "decoder.layers.3.self_attn_ln.weight", "decoder.layers.3.self_attn_ln.bias", "decoder.layers.3.cross_attn_ln.weight", "decoder.layers.3.cross_attn_ln.bias", "decoder.layers.3.ffn_layernorm.weight", "decoder.layers.3.ffn_layernorm.bias", "decoder.layers.3.self_attn.c_attn", "decoder.layers.3.encoder_attn.c_attn". Traceback (most recent call last):
During handling of the above exception, another exception occurred:
File "../../train.py", line 537, in
File "/mypath/fairseq/fairseq/distributed/utils.py", line 348, in distributed_main
distributed_main(cfg.distributed_training.device_id, main, cfg, kwargs)
distributed_main(cfg.distributed_training.device_id, main, cfg, kwargs)
distributed_main(cfg.distributed_training.device_id, main, cfg, kwargs) File "/mypath/fairseq/fairseq/distributed/utils.py", line 348, in distributed_main
File "/mypath/fairseq/fairseq/distributed/utils.py", line 348, in distributed_main
File "/mypath/fairseq/fairseq/distributed/utils.py", line 348, in distributed_main
main(cfg, kwargs)
File "../../train.py", line 170, in main
main(cfg, kwargs)
File "../../train.py", line 170, in main
main(cfg, kwargs)
File "../../train.py", line 170, in main
disable_iterator_cache=True,
File "/mypath/utils/checkpoint_utils.py", line 254, in load_checkpoint
main(cfg, kwargs)
File "../../train.py", line 170, in main
disable_iterator_cache=True,disable_iterator_cache=True,
File "/mypath/utils/checkpoint_utils.py", line 254, in load_checkpoint File "/mypath/utils/checkpoint_utils.py", line 254, in load_checkpoint reset_meters=reset_meters, File "/mypath/trainer.py", line 533, in load_checkpoint disable_iterator_cache=True, File "/mypath/utils/checkpoint_utils.py", line 254, in load_checkpoint reset_meters=reset_meters, reset_meters=reset_meters, File "/mypath/trainer.py", line 533, in load_checkpoint File "/mypath/trainer.py", line 533, in load_checkpoint "please ensure that the architectures match.".format(filename) Exception: Cannot load model parameters from checkpoint ../../scripts/biomedgpt_tiny.pt; please ensure that the architectures match. reset_meters=reset_meters, File "/mypath/trainer.py", line 533, in load_checkpoint "please ensure that the architectures match.".format(filename)"please ensure that the architectures match.".format(filename)
ExceptionException: : Cannot load model parameters from checkpoint ../../scripts/biomedgpt_tiny.pt; please ensure that the architectures match.Cannot load model parameters from checkpoint ../../scripts/biomedgpt_tiny.pt; please ensure that the architectures match.
"please ensure that the architectures match.".format(filename)
@aaaaaannie Apologies I missed your response earlier. It seems that arch=ofa_tiny
might not be set up correctly. Which checkpoint did you download? We currently have three models available publicly, and if you downloaded the base model, you’ll need to set arch=ofa_base
instead.
Hi,I am encountering an issue when trying to load a pre-trained OFAModel. The error message I receive is as follows:
Environment Details: pip version: 21.2.4 Fairseq version: Installed from the OFA repository
Steps to Reproduce: Installed Fairseq from the OFA repository. Configured the environment and downloaded the necessary pre-trained datasets. Attempted to load the OFAModel using the provided scripts.