3x Slower when using tf.keras.models.load, instead of tf.saved_model.load

I want to extract intermediate results, so I need keras.Model() to convert model with more outputs. However, as I use tf.keras.models.load, the running time becomes 3x slower...

The example in detect.py uses another load method:

saved_model_loaded = tf.saved_model.load(weights_path, tags=[tag_constants.SERVING])
infer = saved_model_loaded.signatures['serving_default']

As I change it to saved_model_loaded = tf.keras.models.load_model(self.weights_path, compile=False) to load as Keras object instead, the detection runs 3x slower (but it still works). It's weird because in convert_model.py, the code use Keras Module to export:

model = tf.keras.Model(input_layer, pred)
utils.load_weights(model, ...)
model.save(...)

I wonder why would this happen... Isn't it natural to use keras to load? and how should I fix it? Thanks!

hunglc007 / tensorflow-yolov4-tflite

3x Slower when using tf.keras.models.load, instead of tf.saved_model.load #306