Closed jiqing-feng closed 2 weeks ago
Falcon has bus error core dump bug in PagedAttention.single_query_cached_kv_attention.
(torch_new) [jiqingfe@sprocean workloads]$ ./run.sh -t text-generation -m tiiuae/falcon-7b-instruct --model_dtype bfloat16 --input_tokens 32 --output_tokens 32 --num_beams 1 --batch_size 1 --warm_up_steps 1 --run_steps 1 --optimum_intel True OMP: Warning #42: KMP_BLOCKTIME: "INF" is an invalid value; ignored. OMP: Warning #39: KMP_BLOCKTIME value "INF" is invalid. OMP: Info #104: KMP_BLOCKTIME value "200" will be used. INFO:root:args = Namespace(model_id='tiiuae/falcon-7b-instruct', autocast_dtype='float32', ipex_optimize=False, jit=False, torch_compile=False, model_dtype='bfloat16', quant_type='None', backend='inductor', device='cpu', batch_size=1, num_beams=1, input_tokens=32, output_tokens=32, ipex_optimize_transformers=False, warm_up_steps=1, run_steps=1, optimum_intel=True) Loading checkpoint shards: 100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 2/2 [00:00<00:00, 5.41it/s] INFO:root:input tokens length is 34 Setting `pad_token_id` to `eos_token_id`:None for open-end generation. ./run.sh: line 208: 2544181 Bus error (core dumped) numactl -C '0-'${CORES} --membind 0 python $task_name/run_$task_name.py --model_id $model_id --model_dtype $model_dtype --quant_type $quant_type --jit $jit --ipex_optimize $ipex_optimize --autocast_dtype $autocast_dtype --torch_compile $torch_compile --backend $backend --device $device --batch_size $batch_size --num_beams $num_beams --input_tokens $input_tokens --output_tokens $output_tokens --ipex_optimize_transformers $ipex_optimize_transformers --warm_up_steps $warm_up_steps --run_steps $run_steps --optimum_intel $optimum_intel
Falcon has bus error core dump bug in PagedAttention.single_query_cached_kv_attention.