Open gai1995 opened 4 years ago
Same question. I have tried using tensor2tensor.utils.adafactor.AdafactorOptimizer with tensorflow 1.15 to train ALBERT model but the loss did not decrease too.
Same question
I even encounter a question:
AttributeError: 'AdafactorOptimizer' object has no attribute 'get_gradients'
Description
Hi, I want to use adafactor to replace the Adam in my code; But I do not use the T2T framework; Based on the google-released BERT-finetune framework, I just copy the source code of your implementation of adafactor and call like this: optimizer = AdafactorOptimizer() tvars = tf.trainable_variables() grads, = blabla... train_op = optimizer.apply_gradients(list(zip(grads, tvars)),name='train_op')
But it seems not work; The loss did not decrease and the accuracy is very low; Also, I notice the memory usage is almost the same as Adam; What's wrong with this? Could anyone explain to me? Or I could only use the adafactor under the T2T framework? ...
UPDATE
It seems works now, but must be passed into the learning rate mannually; The problem is adafactor converges very very slow and performs worse than the Adam, maybe due to the learning rate I chose; Is there any suggestion how to fix it (how to choose a proper decay_rate and learning_rate)? Thanks!
Environment information