zjunlp / KnowledgeCircuits

[NeurIPS 2024] Knowledge Circuits in Pretrained Transformers
http://knowledgecircuits.zjukg.cn/
MIT License
53 stars 1 forks source link

How to run it with multi device? #5

Open sev777 opened 1 day ago

sev777 commented 1 day ago
./RWKU/KnowledgeCircuits-main/KnowledgeCircuits-main/transformer_lens/components.py:625, in AbstractAttention.forward(self, query_input, key_input, value_input, past_kv_cache_entry, additive_attention_mask, attention_mask)
    616         result = self.hook_result(
    617             bnb.matmul_4bit(
    618                 z.reshape(z.shape[0], z.shape[1], self.cfg.d_model),
   (...)
    622             )
    623         )
    624     else:
--> 625         result = self.hook_result(
    626             einsum(
    627                 "batch pos head_index d_head, \
    628                     head_index d_head d_model -> \
    629                     batch pos head_index d_model",
    630                 z,
    631                 self.W_O,
    632             )
    633         ) 

RuntimeError: Expected all tensors to be on the same device, but found at least two devices, cuda:0 and cuda:1!

If I set n_devide=2 in HookedTransformer.from_pretrained(model_name=LLAMA_2_7B_CHAT_PATH, device="cuda",n_devices=2, fold_ln=False, center_writing_weights=False, center_unembed=False) I will get the above errors.

littlefive5 commented 1 day ago

It currently does not support multi-gpu and I will add this feature recently. Will let you know when I tackle it. You can try to move the tensor to the same device temporarily. Thanks!