keras-team / keras-nlp

Modular Natural Language Processing workflows with Keras
Apache License 2.0
740 stars 218 forks source link

Gemma Model Storing and Loading after Fine tuning #1482

Open kreouzisv opened 4 months ago

kreouzisv commented 4 months ago

Hi there, I encountered a strange bug after trying to load the gemma-2b model using kerasnlp.

My finetuning code is the following:

` def fine_tune(self, X, y): data = generate_training_prompts(X, y)

enable lora-finetuning

    self.model.backbone.enable_lora(rank=self.config['lora_rank'])

    # Reduce the input sequence length to limit memory usage
    self.model.preprocessor.sequence_length = self.config['tokenization_max_length']

    # Use AdamW (a common optimizer for transformer models)
    optimizer = keras.optimizers.AdamW(
        learning_rate=self.config['learning_rate'],
        weight_decay=self.config['weight_decay'],
    )

    # Exclude layernorm and bias terms from decay
    optimizer.exclude_from_weight_decay(var_names=["bias", "scale"])

    self.model.compile(
        loss=keras.losses.SparseCategoricalCrossentropy(from_logits=True),
        optimizer=optimizer,
        weighted_metrics=[keras.metrics.SparseCategoricalAccuracy()],
        sampler=self.config['sampler'],
    )

    self.model.fit(data, epochs=self.config['epochs'], batch_size=self.config['batch_size'])

    # Define the directory name
    fine_tuned_dir_name = f'fine_tuned_{self.config["basemodel"]}_{datetime.now().strftime("%Y%m%d_%H%M%S")}'
    fine_tuned_dir_path = os.path.join('models', fine_tuned_dir_name)

    # Create the directory if it doesn't exist
    if not os.path.exists(fine_tuned_dir_path):
        os.makedirs(fine_tuned_dir_path)

    # Save only the weights in the directory with a specific name
    weights_file_path = os.path.join(fine_tuned_dir_path, 'weights.keras')
    self.model.save(weights_file_path)

    # Save model configuration within the same directory
    model_config = create_model_config(self.config, np.unique(
        y).tolist())  # Ensure you have `class_names` defined or adapt as necessary
    config_filename = os.path.join(fine_tuned_dir_path, 'model_config.json')
    with open(config_filename, 'w') as json_file:
        json.dump(model_config, json_file, indent=4)

    # Push model weights and config to wandb
    # Note: You may need to adjust this depending on how wandb expects files to be saved
    wandb.save(os.path.join(fine_tuned_dir_path, '*'))`

The training completes as expected in keras. Although when I try to load the model using the weights.keras file created from the script above I am getting two unexpected behaviors, see script for loading the model below,

`import keras

loaded_model = keras.saving.load_model("/data/host-category-classification/nlp/classification/Gemma/models" "/fine_tuned_gemma-2b_20240229_151158/weights.keras")

print(loaded_model.summary())`

First, I observed that each call to the loading process will generate unknown set of files that occupy my disk indefinitely ~10 gb. In addition, the loading process takes forever (havent found the actual time but it should not take more than 10 minutes to load) compared to the the gemma.load_preset method. Do you have any suggestions? There seem to be null documentation either on keras nlp or tensorflow regarding model storage and loading for gemma related models.

kreouzisv commented 4 months ago

In addition when loading I get this output :

UserWarning: compile() was not called as part of model loading because the model's compile() method is custom. All subclassed Models that have compile() overridden should also override get_compile_config() and compile_from_config(config). Alternatively, you can call compile() manually after loading. instance.compile_from_config(compile_config)

SamanehSaadat commented 3 months ago

Hi @kreouzisv!

Would it be possible for you to provide a colab that reproduces this issue?

josharian commented 2 months ago

the loading process takes forever [...] compared to the the gemma.load_preset method

I have observed this as well. I'm at 53 minutes of CPU time on a very high end mac laptop and still waiting for the load to complete. [EDIT: completed at the 56 minute mark.]

The reproducer is no more complicated than using keras.callbacks.ModelCheckpoint followed by keras.saving.load_model.

I suspect--without proof--that unzipping the .keras file is a meaningful part of this. For unrelated reasons, I unzipped a .keras file and found it was excruciatingly slow. (Pity that moving to zstd would be a breaking change.)

I also suspect--without proof--that the optimizer state is getting saved and restored, which will significantly increase the disk and load times vs a from_preset with no optimizer. I don't see an obvious load_model API knob in the docs to disable restoring the optimizer state to try out.

josharian commented 2 months ago

I suspect--without proof--that unzipping the .keras file is a meaningful part of this.

OK, ran a quick experiment, and I have some proof now. :)

I took a model I had saved as .keras, the same one referenced above.

I did:

unzip model.keras
mkdir contents
mv assets *.json *.h5 contents
cd contents
zip -0 -r model_store *
mv model_store.zip ../model_store.keras

This increased the file size from ~7.4gb to ~12.8gb.

But it reduced the time required to open the model from ~56 minutes to ~5 minutes.

I also suspect--without proof--that the optimizer state...

It appears I was wrong about this.