njucckevin / SeeClick

The model, data and code for the visual GUI Agent SeeClick
Apache License 2.0
139 stars 8 forks source link

How to continue fine-tuning the SeeClick with Lora on Mind2WEb/AITW data? #35

Closed ZJULiHongxin closed 1 month ago

ZJULiHongxin commented 1 month ago

Thanks for the great work! @njucckevin

I tried reproducing SeeClick's performances on AITW and Mind2WEb but encountered a problem.

After finetuing Qwen-VL with the 1M data mentioned in your paper, I got a LoRA checkpoint. Now I want to finetune this model with the LoRA checkpoint on the downstream Mind2Web training data.

When I set --model_name_or_path as the LoRA checkpoint folder named "checkpoint-5200", the finetuning program raised:

OSError: /data/reproduce_seeclick/checkpoint-5200 does not appear to have a file named config.json. Checkout 'https://huggingface.co//data/reproduce_seeclick/checkpoint-5200/tree/None' for available files.

I also tried to merge the LoRA with the Qwen-VL model and used this model as --model_name_or_path, but the finetuning program raised warnings: `Loading checkpoint shards: 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 10/10 [00:11<00:00, 1.16s/it] transformer.h.0.attn.c_attn not satisfy lora

transformer.h.0.attn.c_attn not satisfy lora

transformer.h.0.attn.c_attn not satisfy lora

transformer.h.0.attn.c_attn not satisfy lora`

Could you please clarify further what this mean

pretrain-ckpt: base model for fine-tuning, e.g. SeeClick-pretrain or Qwen-VL

in here?

njucckevin commented 1 month ago

Hi, The pretrain-ckpt parameter should be set as the file path of the SeeClick-pretrain or Qwen-VL checkpoint in your computer. You can try using absolute path to see if the above problem still exists.