Closed sirfz closed 5 months ago
Could you please provide the comparison and the time difference in loading the model with .keras and and other format.
For more details on the changes included with .keras format and why it is preferred over other format, refer https://keras.io/guides/serialization_and_saving/
I don't have an example at the moment but recently we updated our prod system from keras 2 to keras 3 so we converted all legacy saved models to the new keras 3 format which lead to our service to take over 12 minutes to load all models (>15 models loading in subprocesses in parallel). Moving to from_config
+ load_weights
reduced the time to ~2 minutes (which is on par with what we had before).
For what it's worth, before we did that migration, I was already working on GPT2Backbone models with keras-nlp and noticed the same issue were loading the .keras model was really slow (but didn't give it much thought at the time)
What you're using is actually the same as what load_model
is using except for the interaction with the zip file. So perhaps the zip file reading is the issue.
100% which is why I find this very odd
I encountered this issue before when trying to quantize Gemma
I have created this script to demonstrate the issue (using GPT-2)
check_loading.py
import argparse
import json
import keras
import keras_nlp
def get_args():
parser = argparse.ArgumentParser()
parser.add_argument(
"-m",
"--mode",
default="save",
choices=["save", "load", "load_weights"],
)
args = parser.parse_args()
return args
def main(args):
if args.mode == "save":
model = keras_nlp.models.GPT2CausalLM.from_preset("gpt2_base_en")
# Save keras file
model.save("model.keras")
# Save serialized config and weights
config = keras.saving.serialize_keras_object(model)
with open("model.json", "w") as f:
json.dump(config, f)
model.save_weights("model.weights.h5")
elif args.mode == "load":
model = keras.saving.load_model("model.keras")
else:
with open("model.json", "r") as f:
config = json.load(f)
model = keras.saving.deserialize_keras_object(config)
model.load_weights("model.weights.h5")
if __name__ == "__main__":
keras.config.disable_traceback_filtering()
main(get_args())
Usage:
# 1. Save the model
python check_loading.py -m save
# 2. Profile `load_model`
pyinstrument python check_loading.py -m load
# 3. Profile `deserialize_keras_object` and `load_weights`
pyinstrument python check_loading.py -m load_weights
The result:
Method | Cost Time |
---|---|
load_model |
27.861s |
deserialize_keras_object + load_weights |
3.166s |
Logs:
By diving into the example provided by @james77777778 ,
in the hidden frames, there's a call: Group.__getitem__
-> ZipExtFile.seek
This makes sense when we are using archive.
in python stdlib zipfile.ZipExtFile
: seek
-> read
-> _read1
-> _update_crc
The overhead caused by _update_crc
during each seek()
call is significant.
reference:
https://github.com/python/cpython/blob/f878d46e5614f08a9302fcb6fc611ef49e9acf2f/Lib/zipfile/__init__.py#L1133
A simple way to deal with it, which will work fine:
by changing line 624 to self.io_file = io.BytesIO(self.archive.open(self.root_path, "r").read())
That probably fixes the speed issue but would lead to unwanted extra memory usage which is undesirable
That probably fixes the speed issue but would lead to unwanted extra memory usage which is undesirable
Is that a good tradeoff? Should we instead unzip on disk then load from the h5 file? What do you think @james77777778 @Grvzard ?
Is that a good tradeoff?
Generally, It should be okay to load the entire h5 into memory before loading. This is the case when saving:
We can also provide an option to let users decide whether to use a faster but more memory-intensive approach.
Should we instead unzip on disk then load from the h5 file?
Actually, h5py
doesn't recommend using file-like object. https://docs.h5py.org/en/stable/high/file.html#python-file-like-objects
So, unzipping and then loading from the H5 file might be a better approach, IMO.
So, unzipping and then loading from the H5 file might be a better approach
Same.
Anyone experiencing unreasonably slow load times when loading a keras-format saved model? I have noticed this repeated when working in ipython, where simply instantiating a model via
Model.from_config
then callingmodel.load_weights
is much (several factors) faster than loading amodel.keras
file.My understanding is the keras format is simply a zip file with the config.json file and weights h5 (iirc) but weirdly enough, there's something not right going on while loading.