Closed XuRui314 closed 7 months ago
Hi,
Did you follow the fine-tuning steps provided in our readme_agent?
I'm not sure about your training script, but in our fine-tuning code (actually the official Qwen-VL code), the path of the image should be surrounded by <img> </img>
, as in https://github.com/njucckevin/SeeClick/blob/5067f6bcde12e507cff7dab676b0df6b71d23b79/agent_tasks/mind2web_process.py#L101
I followed the fine-tuning steps, The image icon may have been mistakenly replaced by github. It's in correct format.
I tried using the released checkpoint in huggingface, but still cannot produce the test results, so So I think it’s a data processing problem.😂
How do you deal with the large size of mind2web image, or just use the raw image.
The processing details for mind2web images are in the paper's appendix C.4. We kept the 1920*1080 resolution for the screenshots. And we provided these screenshots in this repo.
Really thanks for sharing, i will try it.
The data format is like:
'Picture 1: /root/data/Mind2Web_related/qwen_image/013781df-4391-4533-bcb1-15f6819064f6-79c4a963-4aa9-49c1-9257-6b0d5069c551.jpg\n Please generate the next move according to the ui screenshot, instruction and previous actions. Instruction: What are the romantic reggae musics from BCD Studio that can be used in tik tok series in andorra. Previous actions:'
For images in Mind2web, i tried using the raw size and cropped size(the raw sizes are very large).
I didn't modify the code of finetuning, but the final results are not good. Can you provide me some advices for solving this problem? Thx