Closed dbylynn closed 2 years ago
Thanks for posting. We did not test as many times on G-Ref as we did on RefCOCO. So it is possible that variance (of the current LAVT implementation) is larger on G-Ref. We will try to find the old training logs.
Meanwhile, we just updated with a new implementation of LAVT which puts BERT inside the overall model. Training time can be significantly reduced by doing this. We did a fresh run on G-Ref Google with this new implementation, on 8 cards and still with batch size 32 (4 samples on each card). We obtained 60.86 oIoU and 64.09 mIoU with this run. Will update the README on how to run this new implementation shortly. Basically, just change 'lavt' to 'lavt_one' when specifying the --model argument. Also, when trying this new implementation, it might be better to run on 8 cards instead of 4 as we did before.
Hi, how much training time will be reduced by moving Bert to the overall model? Take refcoco for example...
If your system's I/O is not a bottleneck or an impacting factor, then it should be reduced by a lot. But in my own experiments on 4 cards, because disk I/O has always been a big problem, it only reduced 7 hours of training time on refcoco.
Hi, I am having some problems following your work. Our configuration is consistent with your release, on the Gref (google) val dataset, we only got 59.03 for Overall IoU, 61.83 for Mean IoU during training, and during testing, we got 59.28 for Overall IoU and 62.55 for Mean IoU.
Here is our configuration: --nproc_per_node 4 -master_port 12345 train.py --model lavt --dataset refcocog --model_id refcocog --batch-size 8 --lr 0.00005 --wd 1e-2 --split val --splitBy google --swin_type base --pretrained_swin_weights ./pretrained_weights/swin_base_patch4_window12_384_22k.pth --epochs 40 --img_size 480