Closed Bidek56 closed 4 years ago
It turns out that due to a TF 2.0 bug loading model from HDF5 file is not possible.
m HDF5 file is not possible.
Well... we are loading from an init checkpoint which is stored in hdf5 when we pass the parameter --init_checkpoint=${ALBERT_DIR}/tf2_model.h5
, so I suppose there is some support for it?
Loading --init_checkpoint=${ALBERT_DIR}/tf2_model.h5
works, what does not work is saving a trained model to .h5 file and using it for predictions.
Loading
--init_checkpoint=${ALBERT_DIR}/tf2_model.h5
works, what does not work is saving a trained model to .h5 file and using it for predictions.
I see what you are saying. Are you trying to make a frozen inference graph? like this: https://datascience.stackexchange.com/questions/33975/what-is-the-difference-between-tensorflow-saved-model-pb-and-frozen-inference-gr
Just trying to save the trained model for subsequent serving with Flask.
We're in the same boat. We are doing it on DataBricks. Had some extra errors with the callbacks. So we commented them out, borrowed your code, and put it in a custom callback and it works now. So cheers. `class MyCustomCallback(tf.keras.callbacks.Callback):
def on_train_batch_begin(self, batch, logs=None): print('Training: batch {} begins at {}'.format(batch, datetime.datetime.now().time()))
def on_train_batch_end(self, batch, logs=None): print('Training: batch {} ends at {}'.format(batch, datetime.datetime.now().time())) print("saving model as per callback to:", os.path.join(FLAGS.output_dir, "1")) tf.saved_model.save(model, os.path.join(FLAGS.output_dir, "1") ) print("model saved")
def on_test_batch_begin(self, batch, logs=None): print('Evaluating: batch {} begins at {}'.format(batch, datetime.datetime.now().time()))
def on_test_batch_end(self, batch, logs=None): print('Evaluating: batch {} ends at {}'.format(batch, datetime.datetime.now().time()))`
@Bidek56 do you mind sharing a little more information? Exactly what line in what file did you add this 'tf.saved_model.save(model, os.path.join(FLAGS.output_dir, "1") )' ? Also, Why are you doing an os.path.join? You are trying to make a subfolder called "1" to save to?
I have added the following line of code after this line:
tf.saved_model.save(model, os.path.join(FLAGS.output_dir, "1") )
os.path.join just adds /1 subfolder to output_dir, it's pure Python.
@birdmw Can tell what changes you did to add your custom callbacks. When I am running training its printing out results at the end of each epoch. I want to see at every step. I am not able to figure it out. can you help?
In order to save the model, I have added this line after the training loop:
tf.saved_model.save(model, os.path.join(FLAGS.output_dir, "1") )
in order to get: assets, saved_model.pb and variablesfrom there, I am trying to load the model and predict a single value:
This solution works for a single value. Thanks