Closed g-w1 closed 4 months ago
If you print(model)
to show which modules are there you'll obtain
LlamaForCausalLM(
(model): LlamaModel(
(embed_tokens): Embedding(102400, 4096)
(layers): ModuleList(
(0-29): 30 x LlamaDecoderLayer(
(self_attn): LlamaSdpaAttention(
(q_proj): Linear(in_features=4096, out_features=4096, bias=False)
(k_proj): Linear(in_features=4096, out_features=4096, bias=False)
(v_proj): Linear(in_features=4096, out_features=4096, bias=False)
(o_proj): Linear(in_features=4096, out_features=4096, bias=False)
(rotary_emb): LlamaRotaryEmbedding()
)
(mlp): LlamaMLP(
(gate_proj): Linear(in_features=4096, out_features=11008, bias=False)
(up_proj): Linear(in_features=4096, out_features=11008, bias=False)
(down_proj): Linear(in_features=11008, out_features=4096, bias=False)
(act_fn): SiLU()
)
(input_layernorm): LlamaRMSNorm()
(post_attention_layernorm): LlamaRMSNorm()
)
)
(norm): LlamaRMSNorm()
)
(lm_head): Linear(in_features=4096, out_features=102400, bias=False)
(generator): WrapperModule()
)
So the the point is that transformer is not a valid module for this model.
Thank you! I didn't understand that the api was not homogeneous for all models. After some playing around here is what I got to work (maybe reference for future viewers of this issue):
tokenizer = AutoTokenizer.from_pretrained("deepseek-ai/deepseek-llm-7b-base")
model = LanguageModel("deepseek-ai/deepseek-llm-7b-base", device_map='cuda', tokenizer=tokenizer)
with model.trace("Hey here is some text"):
output_layer_1_mlp = model.model.layers[0].output.save()
print(output_layer_1_mlp)
I'm not sure exactly what is going on but I think it's because I'm using deepseek maybe?
With this code
I get this error: