/usr/lib/python3.10/inspect.py:288: FutureWarning: torch.distributed.reduce_op is deprecated, please use torch.distributed.ReduceOp instead
return isinstance(object, types.FunctionType)
/usr/local/lib/python3.10/dist-packages/transformers/deepspeed.py:24: FutureWarning: transformers.deepspeed module is deprecated and will be removed in a future version. Please import deepspeed modules directly from transformers.integrations
warnings.warn(
10/18/2024 03:08:59 - WARNING - main - trust_remote_code is set, there is no guarantee this model works properly and it may fail
10/18/2024 03:08:59 - INFO - main - Single-device run.
2024-10-18 03:09:02 [WARNING][auto_accelerator.py:422] Auto detect accelerator: HPU_Accelerator.
2024-10-18 03:09:02 [INFO][utils.py:201] Preparation started.
2024-10-18 03:09:02 [INFO][quantize.py:160] Start to prepare model with fp8_quant.
============================= HABANA PT BRIDGE CONFIGURATION ===========================
PT_HPU_LAZY_MODE = 1
PT_RECIPE_CACHE_PATH =
PT_CACHE_FOLDER_DELETE = 0
PT_HPU_RECIPE_CACHE_CONFIG =
PT_HPU_MAX_COMPOUND_OP_SIZE = 9223372036854775807
PT_HPU_LAZY_ACC_PAR_MODE = 1
PT_HPU_ENABLE_REFINE_DYNAMIC_SHAPES = 0
PT_HPU_EAGER_PIPELINE_ENABLE = 1
PT_HPU_EAGER_COLLECTIVE_PIPELINE_ENABLE = 1
---------------------------: System Configuration :---------------------------
Num CPU Cores : 112
CPU RAM : 1056428680 KB
2024-10-18 03:09:10 [INFO][utils.py:201] Preparation end.
Initializing inference mode
10/18/2024 03:09:11 - INFO - main - Args: Namespace(buckets=[16, 32, 64, 128, 189, 284], output_file='acc_bloomz7b_bs1_measure.txt', tasks=['hellaswag', 'lambada_openai', 'piqa', 'winogrande'], limit_iters=None, device='hpu', model_name_or_path='/DISK0/bloomz-7b1', bf16=True, max_new_tokens=100, max_input_tokens=0, batch_size=1, warmup=3, n_iterations=5, local_rank=0, use_kv_cache=True, use_hpu_graphs=True, dataset_name=None, column_name=None, do_sample=False, num_beams=1, top_k=None, penalty_alpha=None, trim_logits=True, seed=27, profiling_warmup_steps=0, profiling_steps=0, profiling_record_shapes=False, prompt=None, bad_words=None, force_words=None, assistant_model=None, peft_model=None, num_return_sequences=1, token=None, model_revision='main', attn_softmax_bf16=True, output_dir=None, bucket_size=128, bucket_internal=True, dataset_max_samples=-1, limit_hpu_graphs=False, show_graphs_count=False, reuse_cache=False, verbose_workers=False, simulate_dyn_prompt=None, reduce_recompile=False, use_flash_attention=True, flash_attention_recompute=True, flash_attention_causal_mask=False, flash_attention_fast_softmax=True, book_source=False, torch_compile=False, ignore_eos=True, temperature=1.0, top_p=1.0, const_serialization_path=None, trust_remote_code=True, parallel_strategy='none', input_embeds=False, run_partial_dataset=False, load_quantized_model_with_autogptq=False, disk_offload=False, load_quantized_model_with_inc=False, local_quantized_inc_model_path=None, quant_config='./quantization_config/maxabs_measure.json', world_size=0, global_rank=0)
10/18/2024 03:09:11 - INFO - main - device: hpu, n_hpu: 0, bf16: True
10/18/2024 03:09:11 - INFO - main - Model initialization took 13.380s
Traceback (most recent call last):
File "/home/intel/optimum-habana/examples/text-generation/run_lm_eval.py", line 231, in
main()
File "/home/intel/optimum-habana/examples/text-generation/run_lm_eval.py", line 198, in main
lm_tasks = lm_eval.tasks.get_task_dict(args.tasks)
File "/home/intel/optimum-habana/examples/text-generation/tmp/lm-evaluation-harness-0bf683b4e6a9df359b3156ba9ba8d62bdd47e0c0/lm_eval/tasks/init.py", line 415, in get_task_dict
task_name_dict = {
File "/home/intel/optimum-habana/examples/text-generation/tmp/lm-evaluation-harness-0bf683b4e6a9df359b3156ba9ba8d62bdd47e0c0/lm_eval/tasks/init.py", line 416, in
task_name: get_task(task_name)()
File "/home/intel/optimum-habana/examples/text-generation/tmp/lm-evaluation-harness-0bf683b4e6a9df359b3156ba9ba8d62bdd47e0c0/lm_eval/base.py", line 513, in init
self.download(data_dir, cache_dir, download_mode)
File "/home/intel/optimum-habana/examples/text-generation/tmp/lm-evaluation-harness-0bf683b4e6a9df359b3156ba9ba8d62bdd47e0c0/lm_eval/base.py", line 542, in download
self.dataset = datasets.load_dataset(
File "/usr/local/lib/python3.10/dist-packages/datasets/load.py", line 2606, in load_dataset
builder_instance = load_dataset_builder(
File "/usr/local/lib/python3.10/dist-packages/datasets/load.py", line 2277, in load_dataset_builder
dataset_module = dataset_module_factory(
File "/usr/local/lib/python3.10/dist-packages/datasets/load.py", line 1923, in dataset_module_factory
raise e1 from None
File "/usr/local/lib/python3.10/dist-packages/datasets/load.py", line 1875, in dataset_module_factory
can_load_config_from_parquet_export = "DEFAULT_CONFIG_NAME" not in f.read()
File "/usr/lib/python3.10/codecs.py", line 322, in decode
(result, consumed) = self._buffer_decode(data, self.errors, final)
UnicodeDecodeError: 'utf-8' codec can't decode byte 0x8b in position 1: invalid start byte
System Info
Information
Tasks
examples
folder (such as GLUE/SQuAD, ...)Reproduction
HF_ENDPOINT=https://hf-mirror.com \ QUANT_CONFIG=./quantization_config/maxabs_measure.json python run_lm_eval.py \ -o acc_yi34b_bs1_measure.txt \ --model_name_or_path /mnt/disk1/Yi-34B \ --attn_softmax_bf16 \ --use_hpu_graphs \ --trim_logits \ --use_kv_cache \ --bucket_size=128 \ --bucket_internal \ --use_flash_attention \ --flash_attention_recompute \ --bf16 \ --batch_size 1 \ --trust_remote_code
/usr/lib/python3.10/inspect.py:288: FutureWarning:
torch.distributed.reduce_op
is deprecated, please usetorch.distributed.ReduceOp
instead return isinstance(object, types.FunctionType) /usr/local/lib/python3.10/dist-packages/transformers/deepspeed.py:24: FutureWarning: transformers.deepspeed module is deprecated and will be removed in a future version. Please import deepspeed modules directly from transformers.integrations warnings.warn( 10/18/2024 03:08:59 - WARNING - main -trust_remote_code
is set, there is no guarantee this model works properly and it may fail 10/18/2024 03:08:59 - INFO - main - Single-device run. 2024-10-18 03:09:02 [WARNING][auto_accelerator.py:422] Auto detect accelerator: HPU_Accelerator. 2024-10-18 03:09:02 [INFO][utils.py:201] Preparation started. 2024-10-18 03:09:02 [INFO][quantize.py:160] Start to prepare model with fp8_quant. ============================= HABANA PT BRIDGE CONFIGURATION =========================== PT_HPU_LAZY_MODE = 1 PT_RECIPE_CACHE_PATH = PT_CACHE_FOLDER_DELETE = 0 PT_HPU_RECIPE_CACHE_CONFIG = PT_HPU_MAX_COMPOUND_OP_SIZE = 9223372036854775807 PT_HPU_LAZY_ACC_PAR_MODE = 1 PT_HPU_ENABLE_REFINE_DYNAMIC_SHAPES = 0 PT_HPU_EAGER_PIPELINE_ENABLE = 1 PT_HPU_EAGER_COLLECTIVE_PIPELINE_ENABLE = 1 ---------------------------: System Configuration :--------------------------- Num CPU Cores : 112 CPU RAM : 1056428680 KB2024-10-18 03:09:10 [INFO][utils.py:201] Preparation end. Initializing inference mode 10/18/2024 03:09:11 - INFO - main - Args: Namespace(buckets=[16, 32, 64, 128, 189, 284], output_file='acc_bloomz7b_bs1_measure.txt', tasks=['hellaswag', 'lambada_openai', 'piqa', 'winogrande'], limit_iters=None, device='hpu', model_name_or_path='/DISK0/bloomz-7b1', bf16=True, max_new_tokens=100, max_input_tokens=0, batch_size=1, warmup=3, n_iterations=5, local_rank=0, use_kv_cache=True, use_hpu_graphs=True, dataset_name=None, column_name=None, do_sample=False, num_beams=1, top_k=None, penalty_alpha=None, trim_logits=True, seed=27, profiling_warmup_steps=0, profiling_steps=0, profiling_record_shapes=False, prompt=None, bad_words=None, force_words=None, assistant_model=None, peft_model=None, num_return_sequences=1, token=None, model_revision='main', attn_softmax_bf16=True, output_dir=None, bucket_size=128, bucket_internal=True, dataset_max_samples=-1, limit_hpu_graphs=False, show_graphs_count=False, reuse_cache=False, verbose_workers=False, simulate_dyn_prompt=None, reduce_recompile=False, use_flash_attention=True, flash_attention_recompute=True, flash_attention_causal_mask=False, flash_attention_fast_softmax=True, book_source=False, torch_compile=False, ignore_eos=True, temperature=1.0, top_p=1.0, const_serialization_path=None, trust_remote_code=True, parallel_strategy='none', input_embeds=False, run_partial_dataset=False, load_quantized_model_with_autogptq=False, disk_offload=False, load_quantized_model_with_inc=False, local_quantized_inc_model_path=None, quant_config='./quantization_config/maxabs_measure.json', world_size=0, global_rank=0) 10/18/2024 03:09:11 - INFO - main - device: hpu, n_hpu: 0, bf16: True 10/18/2024 03:09:11 - INFO - main - Model initialization took 13.380s Traceback (most recent call last): File "/home/intel/optimum-habana/examples/text-generation/run_lm_eval.py", line 231, in
main()
File "/home/intel/optimum-habana/examples/text-generation/run_lm_eval.py", line 198, in main
lm_tasks = lm_eval.tasks.get_task_dict(args.tasks)
File "/home/intel/optimum-habana/examples/text-generation/tmp/lm-evaluation-harness-0bf683b4e6a9df359b3156ba9ba8d62bdd47e0c0/lm_eval/tasks/init.py", line 415, in get_task_dict
task_name_dict = {
File "/home/intel/optimum-habana/examples/text-generation/tmp/lm-evaluation-harness-0bf683b4e6a9df359b3156ba9ba8d62bdd47e0c0/lm_eval/tasks/init.py", line 416, in
task_name: get_task(task_name)()
File "/home/intel/optimum-habana/examples/text-generation/tmp/lm-evaluation-harness-0bf683b4e6a9df359b3156ba9ba8d62bdd47e0c0/lm_eval/base.py", line 513, in init
self.download(data_dir, cache_dir, download_mode)
File "/home/intel/optimum-habana/examples/text-generation/tmp/lm-evaluation-harness-0bf683b4e6a9df359b3156ba9ba8d62bdd47e0c0/lm_eval/base.py", line 542, in download
self.dataset = datasets.load_dataset(
File "/usr/local/lib/python3.10/dist-packages/datasets/load.py", line 2606, in load_dataset
builder_instance = load_dataset_builder(
File "/usr/local/lib/python3.10/dist-packages/datasets/load.py", line 2277, in load_dataset_builder
dataset_module = dataset_module_factory(
File "/usr/local/lib/python3.10/dist-packages/datasets/load.py", line 1923, in dataset_module_factory
raise e1 from None
File "/usr/local/lib/python3.10/dist-packages/datasets/load.py", line 1875, in dataset_module_factory
can_load_config_from_parquet_export = "DEFAULT_CONFIG_NAME" not in f.read()
File "/usr/lib/python3.10/codecs.py", line 322, in decode
(result, consumed) = self._buffer_decode(data, self.errors, final)
UnicodeDecodeError: 'utf-8' codec can't decode byte 0x8b in position 1: invalid start byte
Expected behavior
quantization successfully running