zhegan27 / VILLA

Research Code for NeurIPS 2020 Spotlight paper "Large-Scale Adversarial Training for Vision-and-Language Representation Learning": UNITER adversarial training part
https://arxiv.org/pdf/2006.06195.pdf
MIT License
119 stars 14 forks source link

When will the adversarial training code of pretraining in indomain dataset be released? #1

Closed youngfly11 closed 3 years ago

youngfly11 commented 3 years ago

Hi, zhe;

Thanks for your excellent work. Recently I want to reproduce some results in Villa and conduct pre-training on indomain datasets. I am curious about whether it is possible to mimic the adversarial training codes in train_vqa_adv.py to pretraining stage simply? Is there any specific configuration for adversarial training in pretraining stage?

zhegan27 commented 3 years ago

Sorry for the late response due to holiday season. Yes, basically you can follow the adversarial training code in train_vqa_adv.py to get the adversarial pre-training code ready. We also plan to release the pre-training code. Thanks for your reminder. Please stay tuned. We will get this done asap.

Meanwhile, you can also try by yourself. There is no specific things that you need to worry about. Basically, follow the pre-training configuration file of the UNITER code base, and then add the adversarial-training-related hyper-parameters. Hope it helps!

Best, Zhe

youngfly11 commented 3 years ago

Hi,zhe; Thanks for your response. I have a follow-up question. When I run the pretraining code in this VILLA repo, I found the training is very slow by using the default setting (setting the worker=4), the GPU utilization is very slow. When I set the worker=8 or higher, It will raise a problem as following. I am wondering that Do you have the same phenomenon during training? How is your training speed in pretraining?

102708280-cdec8400-42dc-11eb-9ecf-17bb770f2397

zhegan27 commented 3 years ago

Thanks for trying our code. Empirically, we did not seem to meet the problem that you mentioned. How low is your GPU utilization?

We ran the pretraining code on our internal Microsoft GPU clusters, and did not observe the low utilization phenomenon. This may be caused by your RAM size or disk speed, or other constraints. When you tried the fine-tuning code, did you also have the same low-utilization problem? Thanks.

Best, Zhe