titu1994 / keras-one-cycle

Implementation of One-Cycle Learning rate policy (adapted from Fast.ai lib)
MIT License
285 stars 78 forks source link

update one_cycle callback to use .fit arguments #8

Closed hardianlawi closed 5 years ago

hardianlawi commented 5 years ago

Some arguments do not need to be manually specified since they can be retrieved from the .fit params. Feel free to merge it if you think this could be beneficial.

For the python formatter, I used pylint which is why it seems like there are a lot of changes, but most of them are just format changes.

P.S. I only changed OneCycleLR, but the same changes can be made to LRFinder as well. I haven't pushed the changes because I want to make the LRFinder to support a generator for validation_data

titu1994 commented 5 years ago

Also, in regards to

P.S. I only changed OneCycleLR, but the same changes can be made to LRFinder as well. I haven't pushed the changes because I want to make the LRFinder to support a generator for validation_data.

Currently, I randomly sample one batch of validation data from the full validation set each time I test the loss value. This can show some stochastic noise cause the loss may reduce simply cause the validation samples are "easier" or "harder" for the model.

In your opinion, would it be better to cache a single validation batch, or maybe a few batches (say 10 fixed batches so that all classes would appear at least a few times) to test the loss. I haven't checked recently what Fast.ai does in this regard, so we could emulate them if need be.

hardianlawi commented 5 years ago

Now that you mentioned it, I don't see the reason to use validation_data at all. From my understanding, the purpose of LRFinder is to find a "good" learning rate such that the training doesn't diverge and this can be done by observing the change in the training loss with respect to the learning rate.

titu1994 commented 5 years ago

I believe Fast.ai uses the validation loss as they want to find out at what point the model diverges on unseen data, as NNs would generally be able to reduce train loss, even with surprisingly high lrs.

I really should check out Fast.ai's current implementation of the lr_find() method, but I have a somewhat large backlog to deal with as well.

titu1994 commented 5 years ago

Should I simply merge and make the formatting corrections myself?

hardianlawi commented 5 years ago

Hi,

I have some deadlines to chase, so it would be great if you don't mind doing it.

hardianlawi commented 5 years ago

Also, in regards to the loss, I checked the paper and apparently they are using validation_loss.

Therefore, my suggestion is to cache a few batches to use for validation, and then renew the cache after certain steps. I will try to do this since I need it for my project.

crismaz commented 5 years ago

I'm working with Colab with fit_generator instead of fit, and the code failed for me (the keys weren't in the dictionary). Could make it work by again re-supplying some of the params (eg. batch_size).