njucckevin / SeeClick

The model, data and code for the visual GUI Agent SeeClick
Apache License 2.0
139 stars 8 forks source link

Which weights are updated while pretraining and finetuning? #12

Closed kig1929 closed 4 months ago

kig1929 commented 4 months ago

Thanks for sharing good works:)

When dividing Qwen-VL into ViT, adapter, and LM, can you clarify which weights are updated while pretraining and finetuning?

Also, I have a question for confirmation. In Figure 1 (a) of the paper, ViT and VL Adapter are not included in LVLM(yellow box). I think the yellow box is LM, but is it LVLM?

njucckevin commented 4 months ago

Sorry for a bit late. The customized lora parameters for pre-training and fine-tuning are in finetune.py line 317.

The figure 1(a) is just a schematic. We actually do not include new architecture, and all modules on the right side indicating the LVLM (i.e., Qwen-VL).

njucckevin commented 4 months ago

finetune.py employs lora fine-tuning for parameters containing names in line 317. You may print the target_modules in line 327 to check which modules are updated in ViT, adapter, and LLM.

kig1929 commented 4 months ago

Thank you for answer!

Just to double check, are you saying that the target weights that is updated in pretrain and finetuning are the same?

njucckevin commented 4 months ago

Yes. Due to the difference between the downstream task screens and the pre-trained screens, we find that the performance of lora with the visual coder is slightly better when fine-tuning.

kig1929 commented 4 months ago

Thanks for your prompt reply:) It helped me a lot.