Closed kig1929 closed 2 months ago
Hi,
The --train-epochs 3
parameter in gui-grounding-pre-training is just an approximate range for selecting checkpoint.
We finally used the parameters in gui-grounding-pre-training and testing with a checkpoint_step=20000
as in evaluation-on-screenspot. This is about 1.2 epoch (1 epoch=1000000/64=15625 step). And it takes less than 20 hours with checkpoint-20000 in our 8*A100 training.
Thanks for quick reply! Great!
I'm in pretraining Qwen-VL-chat model.
I processed the pretrain data (Table6) by running the code as is. If you look at gui-grounding-pre-training, it says 3 epochs of learning. But how much learning is correct?
In the paper, Section 3.3, it says around 1 epoch. (... We train Qwen-VL on the dataset we constructed (as described in Section 3.2) for about 10k steps (around 1 epoch) to obtain our GUI base model SeeClick. ...) Also if I use the options in the code as is, it seems to last much longer than 24 hours, unlike the paper.
I'll wait for your reply:)