# Reduce the input sequence length to limit memory usage
self.model.preprocessor.sequence_length = self.config['tokenization_max_length']
# Use AdamW (a common optimizer for transformer models)
optimizer = keras.optimizers.AdamW(
learning_rate=self.config['learning_rate'],
weight_decay=self.config['weight_decay'],
)
# Exclude layernorm and bias terms from decay
optimizer.exclude_from_weight_decay(var_names=["bias", "scale"])
self.model.compile(
loss=keras.losses.SparseCategoricalCrossentropy(from_logits=True),
optimizer=optimizer,
weighted_metrics=[keras.metrics.SparseCategoricalAccuracy()],
sampler=self.config['sampler'],
)
self.model.fit(data, epochs=self.config['epochs'], batch_size=self.config['batch_size'])
# Define the directory name
fine_tuned_dir_name = f'fine_tuned_{self.config["basemodel"]}_{datetime.now().strftime("%Y%m%d_%H%M%S")}'
fine_tuned_dir_path = os.path.join('models', fine_tuned_dir_name)
# Create the directory if it doesn't exist
if not os.path.exists(fine_tuned_dir_path):
os.makedirs(fine_tuned_dir_path)
# Save only the weights in the directory with a specific name
weights_file_path = os.path.join(fine_tuned_dir_path, 'weights.keras')
self.model.save(weights_file_path)
# Save model configuration within the same directory
model_config = create_model_config(self.config, np.unique(
y).tolist()) # Ensure you have `class_names` defined or adapt as necessary
config_filename = os.path.join(fine_tuned_dir_path, 'model_config.json')
with open(config_filename, 'w') as json_file:
json.dump(model_config, json_file, indent=4)
# Push model weights and config to wandb
# Note: You may need to adjust this depending on how wandb expects files to be saved
wandb.save(os.path.join(fine_tuned_dir_path, '*'))`
The training completes as expected in keras. Although when I try to load the model using the weights.keras file created from the script above I am getting two unexpected behaviors, see script for loading the model below,
First, I observed that each call to the loading process will generate unknown set of files that occupy my disk indefinitely ~10 gb. In addition, the loading process takes forever (havent found the actual time but it should not take more than 10 minutes to load) compared to the the gemma.load_preset method. Do you have any suggestions? There seem to be null documentation either on keras nlp or tensorflow regarding model storage and loading for gemma related models.
Hi there, I encountered a strange bug after trying to load the gemma-2b model using kerasnlp.
My finetuning code is the following:
` def fine_tune(self, X, y): data = generate_training_prompts(X, y)
enable lora-finetuning
self.model.backbone.enable_lora(rank=self.config['lora_rank'])
The training completes as expected in keras. Although when I try to load the model using the weights.keras file created from the script above I am getting two unexpected behaviors, see script for loading the model below,
`import keras
loaded_model = keras.saving.load_model("/data/host-category-classification/nlp/classification/Gemma/models" "/fine_tuned_gemma-2b_20240229_151158/weights.keras")
print(loaded_model.summary())`
First, I observed that each call to the loading process will generate unknown set of files that occupy my disk indefinitely ~10 gb. In addition, the loading process takes forever (havent found the actual time but it should not take more than 10 minutes to load) compared to the the gemma.load_preset method. Do you have any suggestions? There seem to be null documentation either on keras nlp or tensorflow regarding model storage and loading for gemma related models.