Closed white-wolf-tech closed 1 year ago
@Coder-nlper chatglm-6b和chatglm2-6b的应该都可以被量化的,12G的显存就可以把例子数据集完整跑下来。 你有修改代码? 你的运行环境是什么样的? 可以打印出你的模型看看,我的是这样的:
In [12]: base_model Out[12]: ChatGLMForConditionalGeneration( (transformer): ChatGLMModel( (embedding): Embedding( (word_embeddings): Embedding(65024, 4096) ) (rotary_pos_emb): RotaryEmbedding() (encoder): GLMTransformer( (layers): ModuleList( (0-27): 28 x GLMBlock( (input_layernorm): RMSNorm() (self_attention): SelfAttention( (query_key_value): Linear4bit(in_features=4096, out_features=4608, bias=True) (core_attention): CoreAttention( (attention_dropout): Dropout(p=0.0, inplace=False) ) (dense): Linear4bit(in_features=4096, out_features=4096, bias=False) ) (post_attention_layernorm): RMSNorm() (mlp): MLP( (dense_h_to_4h): Linear4bit(in_features=4096, out_features=27392, bias=False) (dense_4h_to_h): Linear4bit(in_features=13696, out_features=4096, bias=False) ) ) ) (final_layernorm): RMSNorm() ) (output_layer): Linear(in_features=4096, out_features=65024, bias=False) ) )
我在这里https://github.com/THUDM/ChatGLM2-6B/issues/163 寻找量化后的层,发现没有,而baichuan模型是有的,第一代chatglm6B也是有的
@Coder-nlper 我这里看是有的:
问题找到了,必须transformers==4.30.2,我用的transformers==4.31.0.dev是不行的
感谢大佬~
@Coder-nlper 请教一下: 我在执行prepare_model_for_kbit_training的时候,会报错: File "/home/jinxiao/code/learn/llm-fruit/model.py", line 123, in create_model model = prepare_model_for_kbit_training(model, use_gradient_checkpointing=training_args.gradient_checkpointing) File "/home/jinxiao/miniconda3/envs/torch2.0_cu11.7/lib/python3.10/site-packages/peft/utils/other.py", line 86, in prepare_model_for_kbit_training model.enable_input_require_grads() File "/home/jinxiao/miniconda3/envs/torch2.0_cu11.7/lib/python3.10/site-packages/transformers/modeling_utils.py", line 1206, in enable_input_require_grads self._require_grads_hook = self.get_input_embeddings().register_forward_hook(make_inputs_require_grads) File "/home/jinxiao/miniconda3/envs/torch2.0_cu11.7/lib/python3.10/site-packages/transformers/modeling_utils.py", line 1223, in get_input_embeddings return base_model.get_input_embeddings() File "/home/jinxiao/miniconda3/envs/torch2.0_cu11.7/lib/python3.10/site-packages/transformers/modeling_utils.py", line 1225, in get_input_embeddings raise NotImplementedError 我transformers的版本是4.30.2
@xiaojinchuan 你需要更新下模型的那个文件夹到最新的版本,尤其是里面的.py文件,如果你是用git lfs拉取的模型,直接git pull就可以了
确实最新的版本已经修复了。非常感谢!
@shuxueslpi chatglm2没法量化的问题解决了吗
@xiaojinchuan 没法量化的问题你解决了吗
@RuSignalFlag 可以量化的啊,更新到新版的transformers==4.30.2
You are loading your model in 8bit or 4bit but no linear modules were found in your model. this can happen for some architectures such as gpt2 that uses Conv1D instead of Linear layers. Please double check your model architecture, or submit an issue on github if you think this is a bug.
然后执行:prepare_model_for_kbit_training就OOM,看log信息,好像Chatglm2-6b的权重并没有被量化。 到prepare_model_for_kbit_training,把未量化权重全部转位float32,然后就OOM了。
使用的模型权重以及代码是今天最新的