Closed i3v closed 3 years ago
I would simply like to mention that save_model()
saves the optimizer state, and it does seem illogical not to save the training history too.
Does that mean, that currenctly the optimizer state is reset every time I call .fit()
? For Example:
for epoch in range(1000): model.fit(); prediciton = model.predict(...);
I'm often doing this to extensively analyze the model's predictions after every epoch. But if I understand that issue correctly, this approach would reset the optimizer after every epoch and therefore would not make use of decay and so on? Or is initial_epoch
solving this?
@jpeg729 , A do agree with you, however, when I look at callbacks, I have a feeling that devs never considered "comprehensive model state saving". So, it would be nice to hear from @fchollet (or someone else, who's "able to see the big picture") - do they consider this as "not yet implemented" or "this functionality is out of scope of this project". Is there some roadmap/milestones for this or something? Or, maybe, such PRs (first for history, next for callbacks) would be OK, say, for keras-contrib? I have a feeling that this is more an architectural choice, than straightforward coding.
@SebastianB12 ,
decay
effect only depends on optimizer.iterations
. You can check, that after you call model.fit
twice, model.optimizer.iterations.get_value()
is doubled. So, you're safe, at least if you do not save your model to disk and reload it. save_model
docstring says that that optimizer's state is saved. But... It looks like this is not tested in "test_model_saving.py" - as far as I can see they only test that the model weights are loaded OK, and we're still able to predict(...)
. fit
call) for "callback" objects. Thus, if you use ReduceLROnPlateau
or something, the behavior might be different from what you expect. Keras really needs a solution for this issue. It's really surprising one cannot pause/resume a training loop safely, considering that the training of a model can take days or weeks. Ok, one can if he basically rewrites the training loop and give up using the callbacks mechanism. But, honestly, I don't see a reason to prefer keras to other high-level frameworks if I cannot use the callbacks.
I think that the root problem is that there's not an abstraction of the training loop, e.g. the Trainer class in Chainer. If you think about it, it's also really ugly to save the optimizer state (and maybe in future the callbacks state and the history) together with the model parameters using a method called "save_model". On the other hand, maybe introducing a "Trainer class" at this point of the development would be a change too big.
This issue has been automatically marked as stale because it has not had recent activity. It will be closed after 30 days if no further activity occurs, but feel free to re-open a closed issue if needed.
Do you guys know of any news regarding this problem?
I am currently trying to develop a binary classifier with keras that will be trained on some initial data. Then during runtime more and more training data gets available and I want to continue the training with this new data. If I got it the right way a simple call of .fit() will reset the optimizer's attributes (e..g learning rate)?
Did you guys find any working solution without rewriting big parts of keras? I really need to find a solution for this issue otherwise I won't be able to use keras as a framework (even though I am not sure if other framework will work)
@i3v, did you end up finding a suitable solution? Or have your ideas evolved about what would be useful here?
@collin, nothing changed, AFAIU. For now I simply:
Not to many lines of code to write, but still feels a bit weird.
I think the question by @SebastianB12 also hasn't been answered yet, right?
Does Keras reset the optimizer if you call any of the fit
functions in a loop instead of using the epochs
argument?
If yes, I'd be really surprised that training my nets worked fine so far..
@ViaFerrata : The question was already answered by @i3v. I also double checked it, and Keras does indeed not reset the optimizer as long as you do not save and reload the model. But thanks for double checking on that as well!
I hate Keras. I love Keras.
Should be a simple parameter in .fit command i.e. (weight_state=OneOf['reset after succisive calls','dont reset'], optimizer_state=OneOf['reset','dont reset'])). This way people could re-initialize weights after a training loop OR not and do the same for an optimizer state without having to do hacky stuff like creating variables in memory and re instantiating after each loop iteration, saving to disk and reloading, etc.
Still an awesome if not the best DL lib out there in my opinion.
This is still an issue for me as well. Obviously it is not particularly difficult to work around the history issues (I just write a function to wrap around .fit or .fit_generator), but it does seem somewhat essential functionality and so a bit boggling Keras does not choose to adopt it, especially (imo) in keeping things pythonic and non-hacky -- and this must be a part of any successful DL library, as Keras has already shown in other areas.
@SebastianB12 So it does reset when you save and load. How can I continue training after loading a model if I'm using an Adam optimizer?
@r8drascal : I did not try that. But from the other comments above I assumed, that Keras really resets the optimizer when the model is saved/loaded. Unfortunately, I currently do not have the time to test it. However, if I understand the newest FAQ correctly and understand the save_model method correctly, it kind of saves the optimizer state. #https://github.com/keras-team/keras/blob/master/keras/models.py . `
if include_optimizer and hasattr(model, 'optimizer'):
if isinstance(model.optimizer, optimizers.TFOptimizer):
warnings.warn(
'TensorFlow optimizers do not '
'make it possible to access '
'optimizer attributes or optimizer state '
'after instantiation. '
'As a result, we cannot save the optimizer '
'as part of the model save file.'
'You will have to compile your model again '
'after loading it. '
'Prefer using a Keras optimizer instead '
'(see keras.io/optimizers).')
else:
f.attrs['training_config'] = json.dumps({
'optimizer_config': {
'class_name': model.optimizer.__class__.__name__,
'config': model.optimizer.get_config()
},
'loss': model.loss,
'metrics': model.metrics,
'sample_weight_mode': model.sample_weight_mode,
'loss_weights': model.loss_weights,
}, default=get_json_type).encode('utf8')
# Save optimizer weights.
symbolic_weights = getattr(model.optimizer, 'weights')
if symbolic_weights:
optimizer_weights_group = f.create_group('optimizer_weights')
weight_values = K.batch_get_value(symbolic_weights)
weight_names = []
for i, (w, val) in enumerate(zip(symbolic_weights,
weight_values)):
# Default values of symbolic_weights is /variable
# for Theano and CNTK
if K.backend() == 'theano' or K.backend() == 'cntk':
if hasattr(w, 'name'):
if w.name.split('/')[-1] == 'variable':
name = str(w.name) + '_' + str(i)
else:
name = str(w.name)
else:
name = 'param_' + str(i)
else:
if hasattr(w, 'name') and w.name:
name = str(w.name)
else:
name = 'param_' + str(i)
weight_names.append(name.encode('utf8'))
optimizer_weights_group.attrs['weight_names'] = weight_names
for name, val in zip(weight_names, weight_values):
param_dset = optimizer_weights_group.create_dataset(
name,
val.shape,
dtype=val.dtype)
if not val.shape:
# scalar
param_dset[()] = val
else:
param_dset[:] = val
`
Can anyone more knowledgable than me confirm that?
@SebastianB12 That's strange.. my model does not seem to work even though I have the latest versions of keras (2.1.5) and tensorflow (1.6.0).
opt = Adam(lr=0.0001, beta_1=0.9, beta_2=0.999, decay=0.01)
rnn_model.compile(loss='binary_crossentropy', optimizer=opt, metrics=["accuracy"])
rnn_model.fit(X, Y, batch_size = 5, epochs=20)
rnn_model.save('./models/my_model.h5')
#This predicts correctly
model = load_model('my_model.h5')
model.predict(x)
#This does NOT predict correctly
model=load_model('my_model.h5')
model.fit(X, Y, batch_size = 5, epochs=1)
model.predict(x)
The second model does not predict correctly because the weights are getting updated after the one epoch of training...
Are you expecting the weights to be fixed despite training?
On Tue, Mar 20, 2018 at 6:18 PM, r8drascal notifications@github.com wrote:
@SebastianB12 https://github.com/sebastianb12 That's strange.. my model does not seem to work even though I have the latest version of keras (2.1.5) and tensorflow (1.6.0).
opt = Adam(lr=0.0001, beta_1=0.9, beta_2=0.999, decay=0.01) rnn_model.compile(loss='binary_crossentropy', optimizer=opt, metrics=["accuracy"]) rnn_model.save('./models/my_model.h5')
This predicts correctly
model = load_model('my_model.h5') model.predict(x)
This does NOT predict correctly
model=load_model('my_model.h5') model.fit(X, Y, batch_size = 5, epochs=1) model.predict(x)
— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/keras-team/keras/issues/6697#issuecomment-374776298, or mute the thread https://github.com/notifications/unsubscribe-auth/ANU-ShJYGr6ix-sAI6LxaGjdfj7pCxRXks5tgYBMgaJpZM4NhZV1 .
@pGit1 By not predicting correctly I mean the predictions are completely off as if it's an untrained model. I understand how gradient descent works.
Update
I haven't figured out the root of the problem. But it seems that the model that I was loading was saved on Keras 2.0.6 and I am loading it on to Keras 2.1.5. Something with the "save_weights" and "load_weights" functions was not working, so I had to load the weights layer by layer on an architecture I built from scratch manually (loading the architecture from the saved model using json worked as well):
for layer_loaded, layer_built in zip(loaded_model,built_model):
layer_built.set_weights(layer_loaded.get_weights())
@r8drascal Wait so the example you gave in your previous comment above was using the model saved on Keras 2.0.6? Did you get a chance to try again with a model compiled with Keras 2.1.5 ?
@plaffitte Sorry, I wasn't clear. Basically, I loaded the old model and saved it in Keras 2.1.5 and reloaded the new one, which wasn't working. This would've been the full code structure--I missed the loading of the old model in the first line.
##Loading Keras 2.0.6 model##
rnn_model = load_model('old_model.h5')
opt = Adam(lr=0.0001, beta_1=0.9, beta_2=0.999, decay=0.01)
rnn_model.compile(loss='binary_crossentropy', optimizer=opt, metrics=["accuracy"])
rnn_model.fit(X, Y, batch_size = 5, epochs=20)
model.predict() ##The predictions look decent
rnn_model.save('./models/my_model.h5') ##Saving it in Keras 2.1.5##
##These predictions are of the same quality as before##
model = load_model('my_model.h5') ##Loading Keras 2.1.5 model##
model.predict(x)
##These predictions are way off as if it's a completely untrained model
model=load_model('my_model.h5') ##Loading Keras 2.1.5 model##
model.fit(X, Y, batch_size = 5, epochs=1)
model.predict(x)
@r8drascal Have you tried printing out the values of the learning rate (with the get_value() method or smth I guess) ? I've had a model running for the past couple of days so am not able to test it myself sorry...
@plaffitte The original model had a learning rate of 0.01. However, the optimizer I'm compiling with is 0.0001 as indicated in my code. I tried compiling without defining the optimizer (i.e. by using the loaded model optimizer) and the results were worse. What's strange is that when I run the program on my course server (deeplearning.ai Coursera), which is using Keras 2.0.7, everything runs perfectly with the above code.
Super weird. Maybe a bug?
On Sun, Mar 25, 2018 at 10:34 PM, r8drascal notifications@github.com wrote:
Update
I haven't figured out the root of the problem. But it seems that the model that I was loading was saved on Keras 2.0.6 and I am loading it on to Keras 2.1.5. Something with the "save_weights" and "load_weights" functions was not working, so I had to load the weights layer by layer on an architecture I built from scratch manually (loading the architecture from the saved model using json worked as well):
for layer_loaded, layer_built in zip(loaded_model,built_model): layer_built.set_weights(layer_loaded.get_weights())
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/keras-team/keras/issues/6697#issuecomment-376029793, or mute the thread https://github.com/notifications/unsubscribe-auth/ANU-Spr8-j7GX3vK1rzGChACzuJYU2amks5tiFPSgaJpZM4NhZV1 .
@i3v Can you share a snippet of your custom callback?
I accumulate the history on each fit
call into a histories
variable.
It sort of makes sense that a single "fit" which has a number of epochs, has its own history because the next time you cal fit you may have changed characteristics of the model, like unfreezing layers.. etc.
But I would like a comprehensive checkpoint restore style in Keras which you can stop training with a SIGINT and then restart the training again exactly where it left off. Preserving the epoch index, the training weights, the optimiser the state, the hyperparameters... etc.
Can someone please explain in detail how to resume training on a partially trained model when using the ReduceLROnPlateau
callback?
Say, I trained the model to 50 epochs, then saved the model using the ModelCheckpoint
callback as an hdf5 file.
Now I want to resume training from epoch 51 by using load_model()
and then model.fit()
. The problem is I'm getting unexpected accuracies on resuming training, which is probably because the callbacks are reset on a model save/load.
I've found references as to how a histories
variable can be used to restore the callbacks and the last learning rate used, but I'm not sure how to do it. Can someone give an example?
Is it even possible to do online learning in Keras? I am not very sure (even though it is 2019).
I know the issue is kind of stale, but I wrote a wrapper that periodically saves and auto-concatenates the model history dict from a pickle file, as well as the last epoch number and model weights. I'd love to hear your thoughts.
pip install keras-buoy https://github.com/dorukkarinca/keras-buoy
Is CSVLogger callback an alternative for this? Or do we still need to manually pickle and append the history object for smooth plotting?
Can someone please explain in detail how to resume training on a partially trained model when using the
ReduceLROnPlateau
callback?Say, I trained the model to 50 epochs, then saved the model using the
ModelCheckpoint
callback as an hdf5 file.Now I want to resume training from epoch 51 by using
load_model()
and thenmodel.fit()
. The problem is I'm getting unexpected accuracies on resuming training, which is probably because the callbacks are reset on a model save/load.I've found references as to how a
histories
variable can be used to restore the callbacks and the last learning rate used, but I'm not sure how to do it. Can someone give an example?
I had the same doubt. From my understanding, Keras can resume training from the latest learning rate. Here's the StackOverflow thread discussing the issue.
history
I wonder, why each
fit(...)
call resetshistory
property. (Same happens forfit_generator
of course). Moreover, in addition to that,callbacks.History
also resets itself inon_train_begin
for some reason. This looks like the lifespan of theHistory
object is intentionally limited to a singlefit(...)
call. But I don't actually get why. Probably, just because it is also an output argument?This behavior looks reasonable for "transfer learning", but not convenient if user wishes to:
Personally, I'd like to do both. And I found that it doesn't "just work" in the 2.0.4. Currently, AFAIU, there's no convenient, out-of-the-box, way to do either.
The "history" behavior could be fixed by replacing this line with:
Also, History.on_train_begin should be changed somehow (renamed to
__init__
?).So... Am I missing something? Would such changes break anything? Or is it breaking "the overall way how things ought to work here"? If so, what is a nice example of
initial_epoch
use? Is there a chance, that a PR, implementing this, would be accepted?callbacks
The
history
property is not the only thing which performs a "reset" inon_train_begin
. Most callbacks do that. Even in docson_train_begin
is used like__init__
and resets the state to initial. This should be modified in order to fully support those two use cases above. Though these modifications look like a separate, (and much larger) piece of work.Approach 1 :
ReduceLROnPlateau
could be, potentially, patched, to automatically restore correct state inon_train_begin
, based onmodel.history
(thus, behave like before if history is empty), instead of blindly resetting their state. This would allow user to manually save-load-adjust their state. As well as adding some additional built-insave_callbacks_state
method.ReduceLROnPlateau
state. Probably, this could be easily achieved by simply making _reset method "public".Approach 2:
ReduceLROnPlateau
) treaton_train_begin
event as "reset, start from scratch". But, say,CSVLogger
makes an attempt to "continue" - to re-open the file that was used before. On the other hand, it actually needs this call to "continue", to open the output file. Thus, we cannot just make launchingon_train_begin
methods conditional (like proposed forhistory
above). Neither we can extract the "inner part" of thefit_generator
method to a separate method. So, it looks like, the essence ofon_train_begin
is not fully clear, and it might be a good idea to separate it intoon_train_begin
+on_train_continue
or toon_train_reset
+on_train_continue
, to support desired use cases, discussed above. After that the "inner part" of thefit_generator
could be "extracted".Those 3 issues I've mentioned in this text make me think that this functionality might be interesting for some users. But significant amount of changes seem to be required. Or.. may be I'm missing something, and there's some easier way?
existing workarounds
history
could be easily concatenated outside keras, if needed.Thus, keras is already flexible enough.