salesforce / ALBEF

Code for ALBEF: a new vision-language pre-training method
BSD 3-Clause "New" or "Revised" License
1.53k stars 195 forks source link

How much memory is needed for pre-training #60

Open 2292384454 opened 2 years ago

2292384454 commented 2 years ago

Hello, I am trying to pre-training on 4 RTX3090(24G), but out of memory always occured even I reduced the batch_size to 4. Could you tell me how much memory you used and how much time did you spend for pre-training?

LiJunnan1992 commented 2 years ago

I used 8 A100 with 40G memory each, and the training takes 3-4 days on the 4M dataset. You may want to try fp16 training or gradient checkpointing techniques to reduce memory usage.

2292384454 commented 2 years ago

I used 8 A100 with 40G memory each, and the training takes 3-4 days on the 4M dataset. You may want to try fp16 training or gradient checkpointing techniques to reduce memory usage.

Thanks a lot, I will try it.