How to run it with multi device?

./RWKU/KnowledgeCircuits-main/KnowledgeCircuits-main/transformer_lens/components.py:625, in AbstractAttention.forward(self, query_input, key_input, value_input, past_kv_cache_entry, additive_attention_mask, attention_mask)
    616         result = self.hook_result(
    617             bnb.matmul_4bit(
    618                 z.reshape(z.shape[0], z.shape[1], self.cfg.d_model),
   (...)
    622             )
    623         )
    624     else:
--> 625         result = self.hook_result(
    626             einsum(
    627                 "batch pos head_index d_head, \
    628                     head_index d_head d_model -> \
    629                     batch pos head_index d_model",
    630                 z,
    631                 self.W_O,
    632             )
    633         )

RuntimeError: Expected all tensors to be on the same device, but found at least two devices, cuda:0 and cuda:1!

If I set n_devide=2 in HookedTransformer.from_pretrained(model_name=LLAMA_2_7B_CHAT_PATH, device="cuda",n_devices=2, fold_ln=False, center_writing_weights=False, center_unembed=False) I will get the above errors.

zjunlp / KnowledgeCircuits

How to run it with multi device? #5