Closed dgaines2 closed 2 years ago
I was having some issues with the current implementation of this callback, so I tried to make a change to it. Now that I'm looking at the unit test in test_callbacks.py
, I have a quick question. Why is model.fit()
called twice on gen
? Should this reduce the learning rate twice or just once? Based on the test, it looks like it should just reduce once.
-- nevermind, I realized the first test is on self.model
and subsequent tests are on model
@dgaines2
Previous this was used because softmax calculations explode when calculating the exponential. I have hence fixed the softmax calculation algorithm, and this callback is no longer needed. If nan happens, I think a better approach is to go back and check the data and optimizers and make sure there isnt anything wrong with those.
Re the strange behavior of generator, the initial class was written in tensorflow 1.x and the previous version worked as expected. The tensorflow 2.x version is not compatible in some of the restarting mechanisms. I did not follow up on that because I think this callback is no longer needed. Please let me know otherwise.
Hey Chi, My main reason to use this callback was for early stopping patience -- since the validation metrics aren't available to typical Keras callbacks with the way the main training loop is set up. Perhaps I could refactor this callback into an EarlyStopping callback that's usable without the _reduce_lr_and_load function. Not sure how you feel about this, but if you're open to it, I should be able to refactor it pretty quickly. Otherwise I can just write a callback on my own fork.
@dgaines2 I think it is a great idea. It is definitely better to have just early stop separate from the current callback.
So, the only issue I can see arising now is the conflict between the patience in ReduceLRUponNan and a different EarlyStopper callback -- both might be called on the same epoch end and probably log twice. I could get entirely get rid of patience in ReduceLRUponNan (if that's okay with you) or I can leave it as it is for backwards compatibility. Do you have any opinion on how to handle this?
@dgaines2 Thanks. I think it makes more sense to get rid of patience in ReduceLRUponNan. IMO, this callback is not useful anymore. It is better to just have and use a new EarlyStopper
I ended up removing ReduceLRUponNan entirely -- but if you'd rather I keep it and just remove the patience logic, I could do that instead. Let me know if you have any requested changes.
@dgaines2 Merged. Thank you!
Use keras backend to change the learning rate in ReduceLRUponNan (instead of re-compiling the model) and make slight changes to the logging.