microsoft / unilm

Large-scale Self-supervised Pre-training Across Tasks, Languages, and Modalities
https://aka.ms/GeneralAI
MIT License
20.2k stars 2.55k forks source link

Beit3 Training Batch Procedure #879

Open PeterDykas opened 2 years ago

PeterDykas commented 2 years ago

When training beit3 on batches of different modalities I was wondering whether you did 3 forward passes for each type of data (image,text,image-text) for each iteration or did you batch them altogether into one forward pass?

From my understanding three separate forward passes and then calculating the loss would have the advantage that you can reduce the padding needed which may help accuracy and speed. However doing a single forward pass may be also faster since you are just doing one forward pass instead of 3.

wenhui0924 commented 2 years ago

Hi @PeterDykas,

Thanks for the question. We did three forward passes for images, texts, and image-text pairs given the different max length of different modality data.

chaochen99 commented 2 years ago

Dear, @wenhui0924

I wonder how you mixed the three data when they were not of equal length.

Thanks!

donglixp commented 1 year ago

The code and pre-trained models of BEiT-3 can be found at aka.ms/beit3.