Closed WinterStraw closed 5 months ago
Qwen 1.5没有任何问题
推理的时候修改LLM-train/Firefly/component/utils.py 把 torch_dtype=torch.float16, 改成 torch_dtype = torch.float32, 即可解决。 A10显卡可能出现这个不兼容问题
Hi, 我也遇到这个问题,是这个配置文件qwen2-7b-sft-qlora.json里面的"fp16": true,改为"bf16": true吗? @WinterStraw
Hi, 我也遇到这个问题,是这个配置文件qwen2-7b-sft-qlora.json里面的"fp16": true,改为"bf16": true吗? @WinterStraw
是的,训练的时候改fp或bf,推理的时候改 torch_dtype
好的,感谢,我试下
你好,感谢你的方法,我在应用的时候发现把 torch_dtype=torch.float16,改成 torch_dtype = torch.float32,推理速度会很慢,改成torch_dtype=torch.bfloat16,可以解决你的问题并提高速度
训练的时候如果出现loss为0,把fp16改成bf16后可以解决。但是推理部分不知道怎么解决。 Loading checkpoint shards: 0%| | 0/4 [00:00<?, ?it/s]Loading checkpoint shards: 100%|███████████████████████████████████████████████| 4/4 [00:02<00:00, 1.54it/s] Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained. User:hello Traceback (most recent call last): File "/home/ecs-user/LLM-train/Firefly/script/chat/chat.py", line 153, in
main()
File "/home/ecs-user/LLM-train/Firefly/script/chat/chat.py", line 136, in main
outputs = model.generate(
File "/home/ecs-user/miniconda3/envs/firefly/lib/python3.9/site-packages/torch/autograd/grad_mode.py", line 27, in decorate_context
return func(*args, **kwargs)
File "/home/ecs-user/miniconda3/envs/firefly/lib/python3.9/site-packages/transformers/generation/utils.py", line 1520, in generate
return self.sample(
File "/home/ecs-user/miniconda3/envs/firefly/lib/python3.9/site-packages/transformers/generation/utils.py", line 2653, in sample
next_tokens = torch.multinomial(probs, num_samples=1).squeeze(1)
RuntimeError: probability tensor contains either
inf
,nan
or element < 0 非常感谢!