keras-team / keras-hub

Pretrained model hub for Keras 3
Apache License 2.0
804 stars 243 forks source link

[T5 1.1] Enable v1.1 Presets #1948

Closed DavidLandup0 closed 3 weeks ago

DavidLandup0 commented 1 month ago

Turns out we already operationally supported T5 1.1 (given the gated activations) but only supported vanilla T5 models through weight conversion and presets.

This PR updates the conversion script to include the T5 1.1 variants:

For example:

t5_small = keras_hub.models.T5Backbone.from_preset("t5_1.1_small")
tokenizer = keras_hub.models.T5Tokenizer.from_preset("t5_1.1_small")

It also updates the conversion to use the save_to_preset() functionality and fixes assertions that raised exceptions, and saves the tokenizer as well.

Numerical Equivalence

small = keras_hub.models.T5Backbone.from_preset("t5_1.1_small")
keras_tokenizer = keras_hub.models.T5Tokenizer.from_preset("t5_1.1_small")

Behaves equally to:

hf_tokenizer = transformers.AutoTokenizer.from_pretrained("google/t5-v1_1-small")
hf_model = transformers.T5ForConditionalGeneration.from_pretrained("google/t5-v1_1-small")

PCA on flattened outputs running on the same input:

image

Notes

The XXL version (11B params, 44GB for weights) is too large to run on consumer hardware. I can't run the conversion script on it. Getting XL weights up on Kaggle as soon as the download is finished.

/cc @divyashreepathihalli

divyashreepathihalli commented 3 weeks ago

@DavidLandup0 the GPU tests are failing, can you pleas take a look?

DavidLandup0 commented 3 weeks ago

@DavidLandup0 the GPU tests are failing, can you pleas take a look?

@divyashreepathihalli - Fixed - there was a missing Kaggle link for the XL preset