guotong1988 / BERT-GPU

multi-gpu pre-training in one machine for BERT from scratch without horovod (Data Parallelism)
Apache License 2.0
173 stars 54 forks source link

OOM error #19

Closed yygle closed 4 years ago

yygle commented 4 years ago

could i ask you about which of the pretrained model of offical bert you used, cuz i use the wwm_uncased_L-24_H-1024_A-16 model, and easily got an error of OOM.

guotong1988 commented 4 years ago

Base 12 layer