njucckevin / SeeClick

The model, data and code for the visual GUI Agent SeeClick
Apache License 2.0
139 stars 8 forks source link

How is Qwen finetuned for evaluating? #22

Closed XuRui314 closed 3 months ago

XuRui314 commented 3 months ago

Qwen and Seeclick both finetune the VIT and LLM, or does Qwen only finetune the LLM?

njucckevin commented 3 months ago

SeeClick uses LoRA to fine-tune customized parameters in both ViT and LLM, as in finetune/finetune.py lines 315-327.

XuRui314 commented 3 months ago

I'm sorry for not being able to express myself clearly. Qwen and SeeClick are both finetuned in downstream dataset like: Table 4, i wonder how the original Qwen model is finetuned in this case.

njucckevin commented 3 months ago

Both Qwen-VL and SeeClick fine-tune the ViT and LLM using LoRA. In our downstream task experiments (Table 2/3/4), the only difference between SeeClick and Qwen-VL is the loaded checkpoint (i.e., the SeeClick-pretrain and original Qwen-VL), the rest of the fine-tuning settings are identical.

XuRui314 commented 3 months ago

Thank you