Pre-training on other BERT models

Muennighoff commented 4 years ago

Thanks for the great repo and your efforts! Two quick questions:

Is there anything that speaks against pre-training VisualBERT with Albert instead of BERT on COCO and then finetune it for downstream tasks? Also, I havn't found exact details on what resources are needed for pre-training, except for that it took less than a day on COCO according to your paper - How much hours did it take & what GPUs did you use?

liunian-harold-li commented 4 years ago

Hi,

I would imagine that using Albert would still work.
For most experiments, I do it on 4 1080Ti with 12G memory. Pre-training on COCO takes less than a day, maybe 18-20 hours? Sorry that I cannot recall the exact amount of time needed. For experiments on VCR, I used 4 V100s with 16G memory.

On Thu, 23 Jul 2020 at 12:26, Muennighoff notifications@github.com wrote:

Thanks for the great repo and your efforts! Two quick questions:

Is there anything that speaks against pre-training VisualBERT with Albert instead of BERT on COCO and then finetune it for downstream tasks? Also, I havn't found exact details on what resources are needed for pre-training, except for that it took less than a day on COCO according to your paper - How much hours did it take & what GPUs did you use?

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/uclanlp/visualbert/issues/12, or unsubscribe https://github.com/notifications/unsubscribe-auth/ALAYVCVE7CHMDIKSWJREF4LR5CFEZANCNFSM4PGBHAWQ .

Muennighoff commented 4 years ago

Okay I see - If I would implement it with Albert, I would have to retrain on COCO right? I cannot just fine-tune using Albert, as the architecture trained on COCO was bert uncased.

uclanlp / visualbert

Pre-training on other BERT models #12