issues
search
guotong1988
/
BERT-GPU
multi-gpu pre-training in one machine for BERT from scratch without horovod (Data Parallelism)
Apache License 2.0
173
stars
54
forks
source link
issues
Newest
Newest
Most commented
Recently updated
Oldest
Least commented
Least recently updated
请问如何进一步实现梯度累计的功能?
#35
600DZY
opened
1 year ago
1
model_fn should return an EstimatorSpec.
#34
Nanamumuhan
closed
2 years ago
5
an error like this : Segmentation fault (core dumped),Is the configuration wrong?
#33
Nanamumuhan
closed
2 years ago
9
Should line 74-75 in optimization_gpu.py be comment out?
#32
rsindper
closed
3 years ago
1
train 10W steps结束后,do_eva阶段出现错误
#31
rxc205
closed
3 years ago
3
关于多GPU训练的一些疑问咨询?
#30
rxc205
closed
3 years ago
13
train_batch_size and time required to pretrain
#29
Jimojimojimo
closed
3 years ago
2
【Try】1-GPU pretrain with big learning rate for 100W-step, then 1-GPU pretrain with small learning rate for another 100W-step.
#28
guotong1988
closed
2 years ago
0
《How To Pre-train BERT In GPUs》
#27
guotong1988
closed
2 years ago
0
num_train_steps是一块卡还是多块卡的step?
#26
zhengyima
closed
3 years ago
4
TensorFlow2 support
#25
guotong1988
closed
2 years ago
0
The `global_step` update in `optimization_gpu.py` (line 74-75) is redundant.
#24
xuegsh
closed
3 years ago
2
GPT support
#23
guotong1988
closed
2 years ago
0
XLNet support
#22
guotong1988
closed
4 years ago
3
Question about "init_checkpoint" and "output_dir" checkpint
#21
shuxiaobo
closed
4 years ago
9
Suffer the Error: tensorflow.python.framework.errors_impl.InvalidArgumentError
#20
shuxiaobo
closed
4 years ago
8
OOM error
#19
yygle
closed
4 years ago
1
Output model files compatible with Official Bert's pre-trained models?
#18
1e0ng
closed
5 years ago
9
wrong when run_pretraining_gpu_v2 with init_checkpoint
#17
ChrisMii
closed
5 years ago
3
run_pretraining_gpu.py not working
#16
652994331
closed
4 years ago
5
so many bugs in run_pretraining.py and run_pretraining_v2.py
#15
vanpersie32
closed
5 years ago
1
difference between run_pretraining_v2.py with run_pretraining.py
#14
vanpersie32
closed
5 years ago
1
ModuleNotFoundError: No module named 'tensorflow.python.distribute.cross_device_ops
#13
vanpersie32
closed
4 years ago
5
Cannot reload pre-trained model
#12
yick2232
closed
5 years ago
2
模型学不到东西
#11
yumath
closed
5 years ago
17
ImportError: No module named 'tensorflow.python.distribute.cross_device_ops'
#10
yumath
closed
5 years ago
2
During eval, getting "ValueError: model_fn should return an EstimatorSpec". During training, OK
#9
aurotripathy
closed
5 years ago
6
ValueError: You must specify an aggregation method to update a MirroredVariable in Tower Context.
#8
nlp4whp
closed
5 years ago
2
I wonder why is the reshaping necessary?
#7
eduOS
closed
5 years ago
1
experiment result
#6
guotong1988
closed
5 years ago
1
loss do not decrease...
#5
guotong1988
closed
5 years ago
1
Can not train on multi-GPUs
#4
andy-yangz
closed
5 years ago
1
Slower than single GPU
#3
hankcs
closed
4 years ago
11
运行create_pretraining_data.py报错
#2
sportzhang
closed
5 years ago
2
This is just for pretraining BERT?
#1
HaishuoFang
closed
5 years ago
6