njucckevin / SeeClick

The model, data and code for the visual GUI Agent SeeClick
Apache License 2.0
139 stars 8 forks source link

Any plan to release pretraining code? #19

Closed ltzheng closed 3 months ago

ltzheng commented 3 months ago

Thank you for your great work. It seems that the pretraining code is not yet open. Do you plan to open-source it?

njucckevin commented 3 months ago

Hi, sorry for a bit late. The script used for pre-training is actually the same as the script used for downstream agent task fine-tuning. Simply organize the data as described in readme_data.md according the Qwen-VL format for --data-path, and replace pretrain-ckpt with the directory of pre-trained Qwen-VL model. Do you mean the detailed data processing code we used?

ltzheng commented 3 months ago

Thanks for prompt response. Yes I am looking for the data processing code to generate the json file needed for training.

njucckevin commented 3 months ago

I will try to update this part of the code in several days.

njucckevin commented 3 months ago

The pre-training scripts are now released :)

ltzheng commented 3 months ago

Great! Thank you very much.