Add `from_huggingface` method to KerasNLP models

shivance commented 9 months ago

Add support for loading huggingface model checkpoints in KerasNLP backbones

Is your feature request related to a problem? Please describe. As of now KerasNLP backbones load pretrained weights of standard checkpoints. However there are lots of fine-tuned checkpoints on huggingface hub which most of the time solve a lot of problems. If we add this functionality of supporting HF checkpoints, we can truly fulfil them with Keras's Multi-backend-promise with KerasNLP's modular design for most of the NLP community.

Describe the solution you'd like Implemnting from_huggingface method passing checkpoint name from huggingface All it will require is mapping layer names and implementing checkpoint-conversion scripts as methods.

Alternative solution Instead of implementing a seperate method, we could modify from_preset method to use huggingface checkpoints

I'm up for contributing this feature.

cc: @abheesht17

mattdangerw commented 9 months ago

I will check with other folks here, but I think this is something we probably will not want to pursue. We could mirror all our own presets on huggingface, or make it easier to bulk convert hf checkpoints offline, but I do not think converting huggingface checkpoints "live" will be a good solution.

Performance will be bad, as you need to load the model twice and assign weights, often from torch -> jax or tensorflow. So basically you are 2x-ing memory usage for a bit. For larger models this is important enough to be a deal breaker, and would make an offline solution more appealing.
Keeping this working reliably will be tricky. Huggingface has there own release schedule and versioning, will occasionally update their own model configs etc. It would be very tricky to guarantee we could convert checkpoints reliably across a large swatch of huggingface versions indefinitely. And a feature that works on, idk, <10% of actually huggingface model ids, seems like potentially bad UX to put in the library.

I think right now it makes sense to continue to integrate with Kaggle https://github.com/keras-team/keras-nlp/pull/1292, which will help us define a external friendly format for our presets. Once we have that we could consider exposing a set of tools to automatically convert huggingface model to our format on a best effort basis. This would never included all models or all huggingface config options (I just don't see that happening feasibly), but it could be easy to use and side step the performance issues mentioned above.

Wauplin commented 6 months ago

Hey there :hugs:

I think there is a confusion in this issue between 2 different topics:

Allowing keras_nlp to load transformers-based checkpoints: this seem to be what's described by @shivance, right?
Loading models from the Hugging Face Hub.

The HF Hub is a platform to host and share all kinds of models, and not only transformers ones. While topic 1. might be tricky for reasons explained by @mattdangerw in https://github.com/keras-team/keras-nlp/issues/1294#issuecomment-1802427809, I do think hosting KerasNLP models on the HF Hub would make sense. Now that both KaggleHub and GS presets are supported, adding a new preset provider doesn't seem too complex.

I have actually worked on a fork to showcase how the implementation would look like: https://github.com/keras-team/keras-nlp/compare/master...Wauplin:keras-nlp:huggingface-hub-integration. The integration requires the huggingface_hub library. Authentication can be configured with the HF_TOKEN environment variable (only for private models or for uploads, similarly to KaggleHub).

Here is a Colab notebook showcasing it.

import keras_nlp
from keras_nlp.models import BertClassifier
from keras_nlp.utils.preset_utils import save_to_preset

classifier = BertClassifier.from_preset("bert_base_en_uncased")
(...) # train/retrain/fine-tune

# Save to Hugging Face Hub
save_to_preset(classifier, "hf://Wauplin/bert_base_en_uncased_retrained")

# Reload from Hugging Face Hub
classifier_reloaded = BertClassifier.from_preset("hf://Wauplin/bert_base_en_uncased_retrained")

Here is how it looks like once uploaded on the Hub: https://huggingface.co/Wauplin/bert_base_en_uncased_retrained/tree/main.. If we go this way, I think we should also upload a default model card with keras-nlp tag to make all KerasNLP models discoverable on the Hub.

WDYT? I am wiling to help creating a PR if this is of interest for the Keras team. It is essentially what's already in the fork + some documentation and testing. On the Hugging Face side, we could make KerasNLP an official library (e.g. searchable, with code snippets, download counts, etc.).

Disclaimer: I work at Hugging Face as a maintainer of the huggingface_hub library.

Wauplin commented 5 months ago

Following my comment above, I've opened https://github.com/keras-team/keras-nlp/pull/1510 to continue the discussion :)

mattdangerw commented 5 months ago

Thanks! Overall totally agree with your comment.

Let's add a hf:// flows for saving and upload, so people can easily host/share weights on the Hugging Face model hub. We are hoping to expose a public form of save_to_preset this week, @SamanehSaadat is working on this. So we might wait to merge that PR until we have the whole picture of download/upload sorted. But let's get it in! (And thanks very much!)

Re: conversion, I still do think a solid set of tooling for converting bi-directionally from transformers format <> KerasNLP format is important, at least for popular architectures (gemma, llama, mistral, falcon, bloom...). But see more design questions there to nail down. Let's start with the hub integration, and keep figuring out what we want for conversion tooling.

Note that for Gemma, @nkovela1 did give us a tool for HF export -> https://github.com/keras-team/keras-nlp/blob/master/tools/gemma/export_gemma_to_hf.py, so a flow of fine-tuning with Keras exporting to vllm, TGI, etc is possible. But I suspect we might want to move stuff like that into the library proper at some point.

mattdangerw commented 5 months ago

Draft for public saving API -> https://github.com/keras-team/keras-nlp/pull/1512, though expect some changes. Comments welcome!

keras-team / keras-nlp

Add `from_huggingface` method to KerasNLP models #1294

Add support for loading huggingface model checkpoints in KerasNLP backbones

I'm up for contributing this feature.