Open hiprince opened 4 days ago
I failed to run ChatGLM model with ColossalAI 0.3.6.
KeyError Traceback (most recent call last) Cell In[4], line 112 110 else: 111 print('Skip launch colossalai') --> 112 benchmark_inference( 113 model_id, 114 "fp16", 115 max_input_len=max_input_len, 116 max_output_len=max_seq_len, 117 tp_size=tp_size, 118 batch_size=batch_size) 121 recorder.print()
Cell In[4], line 75, in benchmark_inference(model_id, dtype, max_input_len, max_output_len, tp_size, batch_size) 63 model = model.to(torch.bfloat16) 65 inference_config = InferenceConfig( 66 dtype=dtype, 67 max_batch_size=batch_size, (...) 73 use_cuda_kernel=True, 74 ) ---> 75 engine = InferenceEngine(model, tokenizer, inference_config, verbose=False) 77 generation_config = GenerationConfig( 78 pad_token_id=tokenizer.pad_token_id, 79 max_length=max_input_len + max_output_len, 80 # max_new_tokens=args.max_output_len, 81 ) 82 tokens=gen_tokens(tokenizer, dataset, dataset_format)
File ~/.local/lib/python3.10/site-packages/colossalai/inference/core/engine.py:75, in InferenceEngine.init(self, model_or_path, tokenizer, inference_config, verbose, model_policy) 72 self.verbose = verbose 73 self.logger = get_dist_logger(name) ---> 75 self.init_model(model_or_path, model_policy) 77 self.generation_config = inference_config.to_generation_config(self.model_config) 79 self.tokenizer = tokenizer
File ~/.local/lib/python3.10/site-packages/colossalai/inference/core/engine.py:148, in InferenceEngine.init_model(self, model_or_path, model_policy) 146 else: 147 modeltype = "nopadding" + self.model_config.model_type --> 148 model_policy = model_policy_map[model_type]() 150 pg_mesh = ProcessGroupMesh(self.inference_config.pp_size, self.inference_config.tp_size) 151 tp_group = pg_mesh.get_group_along_axis(TP_AXIS)
KeyError: 'nopadding_chatglm'
ColossalAI 0.3.6 PyTorch 2.3.1 CUDA 12.1 NV driver 545
We have not yet adapted ChatGLM, but we will adapt these general models in the future.
Is there an existing issue for this bug?
🐛 Describe the bug
I failed to run ChatGLM model with ColossalAI 0.3.6.
backtrace is here
KeyError Traceback (most recent call last) Cell In[4], line 112 110 else: 111 print('Skip launch colossalai') --> 112 benchmark_inference( 113 model_id, 114 "fp16", 115 max_input_len=max_input_len, 116 max_output_len=max_seq_len, 117 tp_size=tp_size, 118 batch_size=batch_size) 121 recorder.print()
Cell In[4], line 75, in benchmark_inference(model_id, dtype, max_input_len, max_output_len, tp_size, batch_size) 63 model = model.to(torch.bfloat16) 65 inference_config = InferenceConfig( 66 dtype=dtype, 67 max_batch_size=batch_size, (...) 73 use_cuda_kernel=True, 74 ) ---> 75 engine = InferenceEngine(model, tokenizer, inference_config, verbose=False) 77 generation_config = GenerationConfig( 78 pad_token_id=tokenizer.pad_token_id, 79 max_length=max_input_len + max_output_len, 80 # max_new_tokens=args.max_output_len, 81 ) 82 tokens=gen_tokens(tokenizer, dataset, dataset_format)
File ~/.local/lib/python3.10/site-packages/colossalai/inference/core/engine.py:75, in InferenceEngine.init(self, model_or_path, tokenizer, inference_config, verbose, model_policy) 72 self.verbose = verbose 73 self.logger = get_dist_logger(name) ---> 75 self.init_model(model_or_path, model_policy) 77 self.generation_config = inference_config.to_generation_config(self.model_config) 79 self.tokenizer = tokenizer
File ~/.local/lib/python3.10/site-packages/colossalai/inference/core/engine.py:148, in InferenceEngine.init_model(self, model_or_path, model_policy) 146 else: 147 modeltype = "nopadding" + self.model_config.model_type --> 148 model_policy = model_policy_map[model_type]() 150 pg_mesh = ProcessGroupMesh(self.inference_config.pp_size, self.inference_config.tp_size) 151 tp_group = pg_mesh.get_group_along_axis(TP_AXIS)
KeyError: 'nopadding_chatglm'
Environment
ColossalAI 0.3.6 PyTorch 2.3.1 CUDA 12.1 NV driver 545