openvinotoolkit / openvino.genai

Run Generative AI models using native OpenVINO C++ API
Apache License 2.0
133 stars 160 forks source link

issue with chatglm2-6b #934

Open QuPengfei opened 2 weeks ago

QuPengfei commented 2 weeks ago

i saw the issue with chatglm2-6b.

it run successfully if with numactl -m 0 -C 0-23. it run failed if with numactl -m 0 -C 0-31, or 0-47 , or 0-55.

i can be reproduced with INT8_ASYM or 4BIT_MAXIMUM quantization

here the command to do quatization: python3 convert.py --model_id $model_path -c $DATA_TYPE --output_dir $target_path

here is the command line to do inference numactl -m 0 -C 0-47 python benchmark.py -m /app/savedmodels/THUDM/chatglm2-6b/pytorch/dldt/compressed_weights/OV_FP32-INT8_ASYM/ -d CPU -n 3 -p "It is done, and submitted. You can play 'Survival of the Tastiest' on Android, and on the web. Playing on the web works, but you have to simulate multiple touch for table moving and that can be a bit confusing. There is a lot I'd like to talk about. I will go through every topic, insted of making the typical what went right/wrong list. Concept Working over the theme was probably one of the hardest tasks which I had to face. Originally, I had an idea of what kind of game I wanted to develop, gameplay wise - something with a lot of enemies/actors, simple graphics, maybe set in space, controlled from a top-down view. I was confident that I could fit any theme around it. In the end, the problem with a theme like 'Evolution' in a game is that evolution is unassisted. It happens through several seemingly random mutations over time, with the most apt permutation surviving. This genetic car simulator is, in my opinion, a great example of actual evolution of a species facing a challenge. But is it a game? In a game, you need to control something to reach an objective. That control goes against what evolution is" -r /app/output/chatglm2-6b-4BIT_MAXIMUM-16-256-256.1.csv -ic 256 -mc 2 -bs 16 --torch_compile_backend openvino --fuse_decoding_strategy -od /app/output --genai

[ INFO ] ==SUCCESS FOUND==: use_case: text_gen, model_type: chatglm2-6b [ INFO ] OV Config={'CACHE_DIR': ''} [ INFO ] OPENVINO_TORCH_BACKEND_DEVICE=CPU [ INFO ] Model path=/app/savedmodels/THUDM/chatglm2-6b/pytorch/dldt/compressed_weights/OV_FP32-INT8_ASYM, openvino runtime version: 2024.4.0-16579-c3152d32c9c-releases/2024/4 [ INFO ] Pipeline initialization time: 0.98s [ INFO ] Numbeams: 1, benchmarking iter nums(exclude warm-up): 3, prompt nums: 1, prompt idx: [0] [ INFO ] [warm-up] Input text: It is done, and submitted. You can play 'Survival of the Tastiest' on Android, and on the web. Playing on the web works, but you have to simulate multiple touch for table moving and that can be a bit confusing. There is a lot I'd like to talk about. I will go through every topic, insted of making the typical what went right/wrong list. Concept Working over the theme was probably one of the hardest tasks which I had to face. Originally, I had an idea of what kind of game I wanted to develop, gameplay wise - something with a lot of enemies/actors, simple graphics, maybe set in space, controlled from a top-down view. I was confident that I could fit any theme around it. In the end, the problem with a theme like 'Evolution' in a game is that evolution is unassisted. It happens through several seemingly random mutations over time, with the most apt permutation surviving. This genetic car simulator is, in my opinion, a great example of actual evolution of a species facing a challenge. But is it a game? In a game, you need to control something to reach an objective. That control goes against what evolution is [ INFO ] [warm-up] Batch_size=16, all input token size after padding: 256 16, all max_output_token_size: 256 16 [ ERROR ] An exception occurred [ INFO ] Traceback (most recent call last): File "/app/benchmark.py", line 856, in main iter_data_list, pretrain_time = CASE_TO_BENCH[model_args['use_case']](model_path, framework, args.device, model_args, args.num_iters) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/app/benchmark.py", line 462, in run_text_generation_benchmark text_gen_fn(input_text, num, model, tokenizer, args, iter_data_list, md5_list, prompt_idx_list[idx], bench_hook, model_precision, proc_id) File "/app/benchmark.py", line 348, in run_text_generation_genai generation_result = model.generate(input_text_list, max_new_tokens=max_gen_tokens, num_beams=args["num_beams"]) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ RuntimeError: Exception from src/inference/src/cpp/infer_request.cpp:223: Exception from src/plugins/intel_cpu/src/graph.cpp:1243: Node __module.transformer.encoder.layers.0.self_attention.core_attention/aten::scaled_dot_product_attention/ScaledDotProductAttention of type ScaledDotProductAttentionWithKVCache Check 'B == B_state' failed at src/plugins/intel_cpu/src/nodes/scaled_attn.cpp:1393: beam idx batch: 14 is not equal to batch of state: 16

peterchen-intel commented 1 week ago

@QuPengfei Can you please share the output of "lscpu", "numactl -H", "lscpu -e"? If this can be reproduced with other models?