Add FLAN-UL2 - Githubissues

shermansiu commented 1 year ago

Model description

UL2 is a unified framework for pretraining models that are universally effective across datasets and setups. UL2 uses Mixture-of-Denoisers (MoD), a pre-training objective that combines diverse pre-training paradigms together. UL2 introduces a notion of mode switching, wherein downstream fine-tuning is associated with specific pre-training schemes.

FLAN-UL2 has the same configuration as the original UL2 20B model, except that it has been instruction tuned with Flan.

Open source status

[X] The model implementation is available
[X] The model weights are available

Provide useful links for the implementation

The model architecture (UL2) is already in Huggingface Transformers. The 20B model weights are here: https://github.com/google-research/google-research/tree/master/ul2#checkpoints

shermansiu commented 1 year ago

@DanielHesslow (since you ported the original UL2 weights). I would like to contribute, but I'm not too sure how to convert the weights from JAX to PyTorch.

DanielHesslow commented 1 year ago

I had a dirty dirty script which unfortunately lives on my old dev machine that I don't have with me at the moment 😅

I basically just loaded the t5 weights and went through and renamed every thing to match the HF format.

ArthurZucker commented 1 year ago

Hey! Thanks for opening, they will be available on the hub soon! We are converting them with @younesbelkada

shermansiu commented 1 year ago

The model is already out! (https://huggingface.co/google/flan-ul2) @younesbelkada has a space comparing Flan-T5-XXL and Flan-UL2 here: https://huggingface.co/spaces/ybelkada/i-like-flan-ul2

huggingface / transformers

Add FLAN-UL2 #21917

Model description

Open source status

Provide useful links for the implementation