Closed xzqxnet0990 closed 1 year ago
This is expected. The two matrices you see are AdamW optimizer states. The correct option to convert a trainstate with optimizer state to huggingface is using --load_checkpoint="trainstate_params::/path/to/trainstate
instead of --load_checkpoint="params::/path/to/trainstate
. Please see the checkpoint documentation for more details.
I create my own LLAMA_STANDARD_CONFIGS with
'3m': { 'vocab_size': 49953, 'hidden_size': 64, 'intermediate_size': 128, 'num_hidden_layers': 4, 'num_attention_heads': 8, 'max_sequence_length': 2048, 'initializer_range': 0.02, 'rms_norm_eps': 1e-6, 'use_cache': True, 'tie_word_embeddings': False, }, in llama_model.py.
and the same config in convert_easylm_to_hf.py
'3m': { 'dim': 64, 'intermediate_size': 128, 'n_layers': 4, 'n_heads': 8, 'norm_eps': 1e-6, },
Because I use a different tokenizer.model file not the official llama tokenizer.model file. It will cause the mismatched_sizes problems
File "/root/miniconda3/envs/EasyLM/lib/python3.8/site-packages/transformers/modeling_utils.py", line 3278, in _load_pretrained_model raise RuntimeError(f"Error(s) in loading state_dict for {model.class.name}:\n\t{error_msg}") RuntimeError: Error(s) in loading state_dict for LlamaForCausalLM: size mismatch for model.embed_tokens.weight: copying a param with shape torch.Size([49954, 64]) from checkpoint, the shape in current model is torch.Size([32000, 64]). size mismatch for lm_head.weight: copying a param with shape torch.Size([49954, 64]) from checkpoint, the shape in current model is torch.Size([32000, 64]). You may consider adding
ignore_mismatched_sizes=True
in the modelfrom_pretrained
method.
However, I change the source code with ignore_mismatched_sizes=True
model = LlamaForCausalLM.from_pretrained(tmp_model_path, torch_dtype=torch.float16, ignore_mismatched_sizes=True)
It will cause another error,
Traceback (most recent call last): File "/root/miniconda3/envs/EasyLM/lib/python3.8/runpy.py", line 194, in _run_module_as_main return _run_code(code, main_globals, None, File "/root/miniconda3/envs/EasyLM/lib/python3.8/runpy.py", line 87, in _run_code exec(code, run_globals) File "/data/ketadb/EasyLM/EasyLM/models/llama/convert_easylm_to_hf.py", line 297, in
mlxu.run(main) File "/root/miniconda3/envs/EasyLM/lib/python3.8/site-packages/absl/app.py", line 308, in run _run_main(main, args) File "/root/miniconda3/envs/EasyLM/lib/python3.8/site-packages/absl/app.py", line 254, in _run_main sys.exit(main(argv)) File "/data/ketadb/EasyLM/EasyLM/models/llama/convert_easylm_to_hf.py", line 289, in main write_model( File "/data/ketadb/EasyLM/EasyLM/models/llama/convert_easylm_to_hf.py", line 205, in write_model model = LlamaForCausalLM.from_pretrained(tmp_model_path, torch_dtype=torch.float16, ignore_mismatched_sizes=True) File "/root/miniconda3/envs/EasyLM/lib/python3.8/site-packages/transformers/modeling_utils.py", line 2881, in from_pretrained ) = cls._load_pretrained_model( File "/root/miniconda3/envs/EasyLM/lib/python3.8/site-packages/transformers/modeling_utils.py", line 3218, in _load_pretrained_model mismatched_keys += _find_mismatched_keys( File "/root/miniconda3/envs/EasyLM/lib/python3.8/site-packages/transformers/modeling_utils.py", line 3141, in _find_mismatched_keys and state_dict[checkpoint_key].shape != model_state_dict[model_key].shape KeyError: 'model.layers.1.self_attn.q_proj.weight'
You might want to change the vocab size in the conversion script in addition to the model file.
It is really easy to fix this problem, I add the vocab_size into LLAMA_CONFIGS
'3b': { 'vocab_size': 32000, 'dim': 3200, 'intermediate_size': 8640, 'n_layers': 26, 'n_heads': 32, 'norm_eps': 1e-6, },
and loading the LLamaConfig with vocab_size
config = LlamaConfig( hidden_size=dim, intermediate_size=params["intermediate_size"], num_attention_heads=params["n_heads"], num_hidden_layers=params["n_layers"], rms_norm_eps=params["norm_eps"], vocab_size=params["vocab_size"] ) Then I test it, it worked.
I used pretrain_llama_7b.sh script to pretrain a easylm model, then I got multiple streaming_train_states and checkpoint files. Then I use the convert_easylm_to_hf.py script to convert streaming_train_state files to hf format, I met a problem about loaded[f"transformer.h.{layer_i}.attention.wq.kernel"]
Error messages:
The error happend in file convert_easylm_to_hf.py about line 136~151. When I print the dict loaded[], I met a wired problems. There are two types of transformer matrix in one streaming_train_state files.
One is
The key begins with params.params.
The other is
The key begins with opt_state.1.0.mu.
However I could not find any code about opt_state.1.0.mu in the whole project. If I want to convert streaming_train_state files, I need to change the source like loaded[f"params.params.transformer.h.{layer_i}.attention.wq.kernel"]
This is my process code
python -m EasyLM.models.llama.convert_easylm_to_hf --load_checkpoint='params::../open_llama_3m/d521488bc3194913871dd8f3617e8dbf/streaming_train_state_245000' --tokenizer_path='../llama-13b-lora-hf/tokenizer.model' --model_size='3m' --output_dir='../openllama-3m-hf'