Open yuanlaishihaoge opened 6 months ago
你是改了代码,用半精度加载的吗
Same issue. Anybody knows how to solve it?
I got the same issue
I got the same issue
update
torch
or eidtllama/generation.py
class Llama: @staticmethod def build( ckpt_dir: str, tokenizer_path: str, max_seq_len: int, max_batch_size: int, model_parallel_size: Optional[int] = None, seed: int = 1, ) -> "Llama": ... assert model_args.vocab_size == tokenizer.n_words if torch.cuda.is_bf16_supported(): #torch.set_default_tensor_type(torch.cuda.BFloat16Tensor) torch.set_default_tensor_type(torch.cuda.HalfTensor) else: torch.set_default_tensor_type(torch.cuda.HalfTensor)
...
llama/generation.py
class Llama: @staticmethod def build( ckpt_dir: str, tokenizer_path: str, max_seq_len: int, max_batch_size: int, model_parallel_size: Optional[int] = None, seed: int = 1, ) -> "Llama": ... assert model_args.vocab_size == tokenizer.n_words if torch.cuda.is_bf16_supported(): #torch.set_default_tensor_type(torch.cuda.BFloat16Tensor) torch.set_default_tensor_type(torch.cuda.HalfTensor) else: torch.set_default_tensor_type(torch.cuda.HalfTensor) ...
it works! thanks
Let me summarize it. It was owing to the fact that triu_tril_cuda_template was implemented for BFfloat in torch 2.1.0 and version later than that. Reference: https://github.com/huggingface/diffusers/issues/3453 So, basically you have two method to solve it.
torch.set_default_tensor_type(torch.cuda.HalfTensor)
torch.set_default_tensor_type(torch.cuda.HalfTensor)
i have same problem when i train llama3, in modeling_llama.py 1095:
causal_mask = torch.triu(causal_mask, diagonal=1)
i fix this by :
causal_mask = causal_mask.to(torch.float32)#改
causal_mask = torch.triu(causal_mask, diagonal=1)
causal_mask = causal_mask.to('cuda', dtype=torch.bfloat16)#改
i pretrain the base model using chinese data, but the result is very bad, i don't know my operation damage the precision, can anyone help me?
I'm using Llama via HuggingFace. Is there a good way to make this edit through their modules at all?
I tried doing
# this does not work
torch.set_default_tensor_type(torch.cuda.HalfTensor)
outputs = self.model.generate(
input_ids,
max_new_tokens=256,
eos_token_id=self.tokenizer.eos_token_id,
do_sample=True,
temperature=0.2, # default is 0.6
top_p=0.9,
)
but, as noted, it does not fix it.
I'm using Llama via HuggingFace. Is there a good way to make this edit through their modules at all?
I tried doing
# this does not work torch.set_default_tensor_type(torch.cuda.HalfTensor) outputs = self.model.generate( input_ids, max_new_tokens=256, eos_token_id=self.tokenizer.eos_token_id, do_sample=True, temperature=0.2, # default is 0.6 top_p=0.9, )
but, as noted, it does not fix it.
you can fix it by another methods
model_args['attn_implementation'] = 'flash_attention_2' model = LlamaForCausalLM.from_pretrained(model_name, **model_args).eval()
adding the flash_attention_2 works for me
(algo_python38) root@4347dc632bb3:/data/data/llama3-main# torchrun --nproc_per_node 1 example_chat_completion.py --ckpt_dir Meta-Llama-3-8B/ --tokenizer_path Meta-Llama-3-8B/tokenizer.model --max_seq_len 512 --max_batch_size 6