guotong1988 / BERT-GPU

multi-gpu pre-training in one machine for BERT from scratch without horovod (Data Parallelism)
Apache License 2.0
173 stars 54 forks source link

loss do not decrease... #5

Closed guotong1988 closed 5 years ago

guotong1988 commented 5 years ago

edit

optimizer = optimization_gpu.create_optimizer(
    None, FLAGS.learning_rate, FLAGS.num_train_steps, FLAGS.num_warmup_steps, False)

to

optimizer = tf.train.AdamOptimizer(learning_rate=FLAGS.learning_rate)