For the learning rate decay schedule, why do you use "lr = get_lr(state["iter_num"]) if decay_lr else learning_rate"? Here, "state["iter_num"]" is the number of processed minibatch, not the number of training steps.
By definition of decay learning rate, shouldn't we use "state["step_count"]" instead of ""state["iter_num"]"?
For the learning rate decay schedule, why do you use "lr = get_lr(state["iter_num"]) if decay_lr else learning_rate"? Here, "state["iter_num"]" is the number of processed minibatch, not the number of training steps.
By definition of decay learning rate, shouldn't we use "state["step_count"]" instead of ""state["iter_num"]"?
I am talking about line 213 in TinyLlama/pretrain/tinyllama.py