intel / llm-on-ray

Pretrain, finetune and serve LLMs on Intel platforms with Ray
Apache License 2.0
94 stars 28 forks source link

AssertionError: BF16 weight prepack needs the cpu support avx512bw, avx512vl and avx512dq #255

Closed iodone closed 2 months ago

iodone commented 2 months ago

The above exception was the direct cause of the following exception:

Traceback (most recent call last): File "/home/work/app/llm-on-ray/llm_on_ray/finetune/finetune.py", line 502, in main() File "/home/work/app/llm-on-ray/llm_on_ray/finetune/finetune.py", line 496, in main results = trainer.fit()

xwu99 commented 2 months ago

@iodone Your CPU doesn't support some AVX512 instructions, could you check with lscpu | grep avx512. FP16 is not supported if AVX512 is not available, in that case you need to set dtype to float (much slower).

iodone commented 2 months ago

@iodone Your CPU doesn't support some AVX512 instructions, could you check with lscpu | grep avx512. FP16 is not supported if AVX512 is not available, in that case you need to set dtype to float (much slower).

Thank you for the reply. I checked that the CPU does not support avx512. Where in the code should set dtype to float be changed?

xwu99 commented 2 months ago

@iodone Your CPU doesn't support some AVX512 instructions, could you check with lscpu | grep avx512. FP16 is not supported if AVX512 is not available, in that case you need to set dtype to float (much slower).

Thank you for the reply. I checked that the CPU does not support avx512. Where in the code should set dtype to float be changed?

you can set that in yaml file

rain7996 commented 2 months ago

@iodone Your CPU doesn't support some AVX512 instructions, could you check with lscpu | grep avx512. FP16 is not supported if AVX512 is not available, in that case you need to set dtype to float (much slower).

Thank you for the reply. I checked that the CPU does not support avx512. Where in the code should set dtype to float be changed?

you can set that in yaml file

how do I set them? In what format? Just dtype: torch.float in the yaml file?

xwu99 commented 2 months ago

@iodone Your CPU doesn't support some AVX512 instructions, could you check with lscpu | grep avx512. FP16 is not supported if AVX512 is not available, in that case you need to set dtype to float (much slower).

Thank you for the reply. I checked that the CPU does not support avx512. Where in the code should set dtype to float be changed?

you can set that in yaml file

how do I set them? In what format? Just dtype: torch.float in the yaml file?

Hi, my mistake. float is only supported in inference. only fp16 and bf16 are supported for finetuning.

https://github.com/intel/llm-on-ray/blob/main/docs/finetune_parameters.md

mixed_precision | no | Whether or not to use mixed precision training. Choose from "no", "fp16", "bf16". Default is "no" if not set.