作者您好！关于finetune后模型测试时读取模型报错

dmndxld commented 1 month ago

您好！我用finetune下的finetune_lora_ds.sh脚本微调了我自己的aitw数据集，模型是原seeclick的base模型，然后用agent_tasks下的aitw_test.py测试时，在模型读取时发生了如下报错 Loading checkpoint shards: 100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 10/10 [00:04<00:00, 2.38it/s] Traceback (most recent call last): File "/home/u2020010349/.conda/envs/FLAGENT/lib/python3.8/runpy.py", line 194, in _run_module_as_main return _run_code(code, main_globals, None, File "/home/u2020010349/.conda/envs/FLAGENT/lib/python3.8/runpy.py", line 87, in _run_code exec(code, run_globals) File "/home/u2020010349/.vscode-server/extensions/ms-python.debugpy-2024.6.0-linux-x64/bundled/libs/debugpy/adapter/../../debugpy/launcher/../../debugpy/main.py", line 39, in cli.main() File "/home/u2020010349/.vscode-server/extensions/ms-python.debugpy-2024.6.0-linux-x64/bundled/libs/debugpy/adapter/../../debugpy/launcher/../../debugpy/../debugpy/server/cli.py", line 430, in main run() File "/home/u2020010349/.vscode-server/extensions/ms-python.debugpy-2024.6.0-linux-x64/bundled/libs/debugpy/adapter/../../debugpy/launcher/../../debugpy/../debugpy/server/cli.py", line 284, in run_file runpy.run_path(target, run_name="main") File "/home/u2020010349/.vscode-server/extensions/ms-python.debugpy-2024.6.0-linux-x64/bundled/libs/debugpy/_vendored/pydevd/_pydevd_bundle/pydevd_runpy.py", line 321, in run_path return _run_module_code(code, init_globals, run_name, File "/home/u2020010349/.vscode-server/extensions/ms-python.debugpy-2024.6.0-linux-x64/bundled/libs/debugpy/_vendored/pydevd/_pydevd_bundle/pydevd_runpy.py", line 135, in _run_module_code _run_code(code, mod_globals, init_globals, File "/home/u2020010349/.vscode-server/extensions/ms-python.debugpy-2024.6.0-linux-x64/bundled/libs/debugpy/_vendored/pydevd/_pydevd_bundle/pydevd_runpy.py", line 124, in _run_code exec(code, run_globals) File "/home/u2020010349/share/yc/FL_agent/SeeClick-main/agent_tasks/lh_make/test_my.py", line 74, in model = AutoPeftModelForCausalLM.from_pretrained(model_path, device_map="cuda", trust_remote_code=True).eval() # load with lora checkpoint File "/home/u2020010349/.conda/envs/FLAGENT/lib/python3.8/site-packages/peft/auto.py", line 123, in from_pretrained tokenizer = AutoTokenizer.from_pretrained( File "/home/u2020010349/.conda/envs/FLAGENT/lib/python3.8/site-packages/transformers/models/auto/tokenization_auto.py", line 847, in from_pretrained return tokenizer_class.from_pretrained( File "/home/u2020010349/.conda/envs/FLAGENT/lib/python3.8/site-packages/transformers/tokenization_utils_base.py", line 2089, in from_pretrained return cls._from_pretrained( File "/home/u2020010349/.conda/envs/FLAGENT/lib/python3.8/site-packages/transformers/tokenization_utils_base.py", line 2311, in _from_pretrained tokenizer = cls(*init_inputs, init_kwargs) File "/home/u2020010349/.cache/huggingface/modules/transformers_modules/checkpoint-1589/tokenization_qwen.py", line 120, in init super().init(kwargs) File "/home/u2020010349/.conda/envs/FLAGENT/lib/python3.8/site-packages/transformers/tokenization_utils.py", line 367, in init self._add_tokens( File "/home/u2020010349/.cache/huggingface/modules/transformers_modules/checkpoint-1589/tokenization_qwen.py", line 229, in _add_tokens if surface_form not in SPECIAL_TOKENS + self.IMAGE_ST: AttributeError: 'QWenTokenizer' object has no attribute 'IMAGE_ST' 我有去qwen下查询相关issue，发现了一个类似的问题，他这里说的是tokenization_qwen.py在136行加入super().init(**kwargs)，奇怪的是我的文件中这行代码位于120行（整个QWenTokenizer类的init的第一句），我也照猫画虎的把这行放到了IMAGE_ST的下面，结果会发生新的报错 Error(s) in loading state_dict for PeftModelForCausalLM: size mismatch for base_model.model.transformer.wte.modules_to_save.default.weight: copying a param with shape torch.Size([151936, 4096]) from checkpoint, the shape in current model is torch.Size([151860, 4096]). size mismatch for base_model.model.lm_head.modules_to_save.default.weight: copying a param with shape torch.Size([151936, 4096]) from checkpoint, the shape in current model is torch.Size([151860, 4096]). File "/home/u2020010349/share/yc/FL_agent/SeeClick-main/agent_tasks/lh_make/test_my.py", line 74, in model = AutoPeftModelForCausalLM.from_pretrained(model_path, device_map="cuda", trust_remote_code=True).eval() # load with lora checkpoint RuntimeError: Error(s) in loading state_dict for PeftModelForCausalLM: size mismatch for base_model.model.transformer.wte.modules_to_save.default.weight: copying a param with shape torch.Size([151936, 4096]) from checkpoint, the shape in current model is torch.Size([151860, 4096]). size mismatch for base_model.model.lm_head.modules_to_save.default.weight: copying a param with shape torch.Size([151936, 4096]) from checkpoint, the shape in current model is torch.Size([151860, 4096]). 您在当时有遇到类似的情况吗？有什么思路可以帮助我解决吗？感谢！

dmndxld commented 1 month ago

model = AutoPeftModelForCausalLM.from_pretrained(model_path, device_map="cuda", trust_remote_code=True).eval()  # load with lora checkpoint

这一句读取时报错的

njucckevin commented 1 month ago

我当时没有遇到这个问题。我看了一下，我上传的ckpt中的tokenization_qwen.py和目前Qwen-VL官方huggingface上的tokenization_qwen.py是有一些不一样的地方的，例如下面的区别。但这个不是我改动的，应该是Qwen-VL在我下载ckpt之后更新过代码，你可以检查一下哪里不同debug看看。

dmndxld commented 1 month ago

我当时没有遇到这个问题。我看了一下，我上传的ckpt中的tokenization_qwen.py和目前Qwen-VL官方huggingface上的tokenization_qwen.py是有一些不一样的地方的，例如下面的区别。但这个不是我改动的，应该是Qwen-VL在我下载ckpt之后更新过代码，你可以检查一下哪里不同debug看看。

感谢您的及时回复！我推测这个问题可能是qwen的模型有改动？我用qwen去做finetune之后，再这个方法修改后就可以了。但是seeclick的base模型finetune之后好像会mismatch模型size

carachu1 commented 1 month ago

我当时没有遇到这个问题。我看了一下，我上传的ckpt中的tokenization_qwen.py和目前Qwen-VL官方huggingface上的tokenization_qwen.py是有一些不一样的地方的，例如下面的区别。但这个不是我改动的，应该是Qwen-VL在我下载ckpt之后更新过代码，你可以检查一下哪里不同debug看看。

感谢您的及时回复！我推测这个问题可能是qwen的模型有改动？我用qwen去做finetune之后，再这个方法修改后就可以了。但是seeclick的base模型finetune之后好像会mismatch模型size

这个tokenization_qwen.py文件是Qwen-VL-Chat里的嘛？我试了您提到的方法，运行model = AutoPeftModelForCausalLM.from_pretrained(model_path, device_map="cuda", trust_remote_code=True).eval()这一行还是出现AttributeError: 'QWenTokenizer' object has no attribute 'IMAGE_ST'这个问题？

dmndxld commented 1 month ago

我当时没有遇到这个问题。我看了一下，我上传的ckpt中的tokenization_qwen.py和目前Qwen-VL官方huggingface上的tokenization_qwen.py是有一些不一样的地方的，例如下面的区别。但这个不是我改动的，应该是Qwen-VL在我下载ckpt之后更新过代码，你可以检查一下哪里不同debug看看。

感谢您的及时回复！我推测这个问题可能是qwen的模型有改动？我用qwen去做finetune之后，再这个方法修改后就可以了。但是seeclick的base模型finetune之后好像会mismatch模型size

这个tokenization_qwen.py文件是Qwen-VL-Chat里的嘛？我试了您提到的方法，运行model = AutoPeftModelForCausalLM.from_pretrained(model_path, device_map="cuda", trust_remote_code=True).eval()这一行还是出现AttributeError: 'QWenTokenizer' object has no attribute 'IMAGE_ST'这个问题？

不是。这个文件准确来说应该是运行finetune后自动存储在你finetune的checkpoint的文件夹下，去修改这里的试试看。

zlsjsj commented 3 weeks ago

我当时没有遇到这个问题。我看了一下，我上传的ckpt中的tokenization_qwen.py和目前Qwen-VL官方huggingface上的tokenization_qwen.py是有一些不一样的地方的，例如下面的区别。但这个不是我改动的，应该是Qwen-VL在我下载ckpt之后更新过代码，你可以检查一下哪里不同debug看看。

你好，我现在也遇到了这个问题，请问你那边下载的qw是哪个版本或者commit id呢

njucckevin commented 2 weeks ago

或许是Qwen-VL的这次commit导致的？我应该是在23年的12月份在hf上下载的Qwen-VL。你也可以尝试下上面提到的方法~

njucckevin / SeeClick

作者您好！关于finetune后模型测试时读取模型报错 #31