Open hjalmarheld opened 1 year ago
Much appreciated. I will incorporate in the next update. Is there a 2.11 compatible way for caching an optimizer with its internal state?
NB the main reason I've added this is because I am struggling with AWS SageMaker's propensity to die on me with 504 Gateway timeouts. The machine dies; I have to stop/start/and re-run the whole lot again. Caching at at least allows to continue training at the last caching point. Do you happen to have an idea how to address the issue of AWS dying?
I haven't dug into the details to be frank, but I found that SciKeras had a similar problem. See discussion here. They seem to since have implemented another approach, with new util functions found here.
As for SageMaker, are you talking about notebook instances? Sounds like it could a RAM issue. Have you tried running it on a larger instance?
Actually I found it. It appears plotting ate up memory.I’ve also coded up TF2.11 serialisation … but not tested yet. Will let you knowThe new branch has also recurrent agents, an initial delta state, and no longer unrolls the main training loop. I’ve also got tensor board support even though I haven’t managed to get the profiler going __Dr. Hans Buehler | @. | http://hans.buehler.londonOn 12 Jan 2023, at 12:50, Erik Hjalmar Held @.> wrote: I haven't dug into the details to be frank, but I found that SciKeras had a similar problem. See discussion here. They seem to since have implemented another approach, with new util functions found here. As for SageMaker, are you talking about notebook instances? Sounds like it could a RAM issue. Have you tried running it on a larger instance?
—Reply to this email directly, view it on GitHub, or unsubscribe.You are receiving this because you commented.Message ID: @.***>
The current code is incompatible with Tensorflow 2.11 due to updates of the optimizer API:s.
The Adam and RMSprop optimizers no longer has the
get_weights()
method.See some info here. A quick fix is to simply pass to the legacy namespace.
This implies changing one row in trainer.py and one row in _trainerserialize.ipynb.
In trainer.py change:
optimzier = config.train("optimizer", "RMSprop", help="Optimizer" )
to:optimzier = config.train("optimizer", tf.keras.optimizers.legacy.RMSprop(), help="Optimizer" )
And in _trainerserialize.ipynb equivalently set the optimizer as:
Apparently the old optimizers are to be kept indefinitely thus this should be a relatively stable solution.