huggingface / optimum-intel

🤗 Optimum Intel: Accelerate inference with Intel optimization tools
https://huggingface.co/docs/optimum/main/en/intel/index
Apache License 2.0
409 stars 112 forks source link

enable gpt2, falcon has core dump error in PagedAttention.single_quer… #979

Closed jiqing-feng closed 2 weeks ago

jiqing-feng commented 3 weeks ago

Falcon has bus error core dump bug in PagedAttention.single_query_cached_kv_attention.

(torch_new) [jiqingfe@sprocean workloads]$ ./run.sh -t text-generation -m tiiuae/falcon-7b-instruct --model_dtype bfloat16 --input_tokens 32 --output_tokens 32 --num_beams 1 --batch_size 1 --warm_up_steps 1 --run_steps 1 --optimum_intel True
OMP: Warning #42: KMP_BLOCKTIME: "INF" is an invalid value; ignored.
OMP: Warning #39: KMP_BLOCKTIME value "INF" is invalid.
OMP: Info #104: KMP_BLOCKTIME value "200" will be used.
INFO:root:args = Namespace(model_id='tiiuae/falcon-7b-instruct', autocast_dtype='float32', ipex_optimize=False, jit=False, torch_compile=False, model_dtype='bfloat16', quant_type='None', backend='inductor', device='cpu', batch_size=1, num_beams=1, input_tokens=32, output_tokens=32, ipex_optimize_transformers=False, warm_up_steps=1, run_steps=1, optimum_intel=True)
Loading checkpoint shards: 100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 2/2 [00:00<00:00,  5.41it/s]
INFO:root:input tokens length is 34
Setting `pad_token_id` to `eos_token_id`:None for open-end generation.
./run.sh: line 208: 2544181 Bus error               (core dumped) numactl -C '0-'${CORES} --membind 0 python $task_name/run_$task_name.py --model_id $model_id --model_dtype $model_dtype --quant_type $quant_type --jit $jit --ipex_optimize $ipex_optimize --autocast_dtype $autocast_dtype --torch_compile $torch_compile --backend $backend --device $device --batch_size $batch_size --num_beams $num_beams --input_tokens $input_tokens --output_tokens $output_tokens --ipex_optimize_transformers $ipex_optimize_transformers --warm_up_steps $warm_up_steps --run_steps $run_steps --optimum_intel $optimum_intel