Open deafTim opened 1 month ago
It seems like this model has a different architecture. Is there a way to fix that?
These are names of keys in layers: Is there a way to may be convert them somehow?
Transformer (from your code)
layers.0.attention.wo.weight
layers.0.attention.wqkv.weight
layers.0.attention_norm.weight
layers.0.feed_forward.w1.weight
layers.0.feed_forward.w2.weight
layers.0.feed_forward.w3.weight
layers.0.ffn_norm.weight
llama-3.2-1B
layers.0.input_layernorm.weight
layers.0.mlp.down_proj.weight
layers.0.mlp.gate_proj.weight
layers.0.mlp.up_proj.weight
layers.0.post_attention_layernorm.weight
layers.0.self_attn.k_proj.weight
layers.0.self_attn.o_proj.weight
layers.0.self_attn.q_proj.weight
layers.0.self_attn.v_proj.weight
When I use meta-llama/Llama-3.2-1B Can it be fixed?