huggingface / transformers

🤗 Transformers: State-of-the-art Machine Learning for Pytorch, TensorFlow, and JAX.
https://huggingface.co/transformers
Apache License 2.0
133.31k stars 26.63k forks source link

Add FLAN-UL2 #21917

Closed shermansiu closed 1 year ago

shermansiu commented 1 year ago

Model description

UL2 is a unified framework for pretraining models that are universally effective across datasets and setups. UL2 uses Mixture-of-Denoisers (MoD), a pre-training objective that combines diverse pre-training paradigms together. UL2 introduces a notion of mode switching, wherein downstream fine-tuning is associated with specific pre-training schemes.

FLAN-UL2 has the same configuration as the original UL2 20B model, except that it has been instruction tuned with Flan.

Open source status

Provide useful links for the implementation

The model architecture (UL2) is already in Huggingface Transformers. The 20B model weights are here: https://github.com/google-research/google-research/tree/master/ul2#checkpoints

shermansiu commented 1 year ago

@DanielHesslow (since you ported the original UL2 weights). I would like to contribute, but I'm not too sure how to convert the weights from JAX to PyTorch.

DanielHesslow commented 1 year ago

I had a dirty dirty script which unfortunately lives on my old dev machine that I don't have with me at the moment 😅

I basically just loaded the t5 weights and went through and renamed every thing to match the HF format.

ArthurZucker commented 1 year ago

Hey! Thanks for opening, they will be available on the hub soon! We are converting them with @younesbelkada

shermansiu commented 1 year ago

The model is already out! (https://huggingface.co/google/flan-ul2) @younesbelkada has a space comparing Flan-T5-XXL and Flan-UL2 here: https://huggingface.co/spaces/ybelkada/i-like-flan-ul2