Closed shermansiu closed 1 year ago
@DanielHesslow (since you ported the original UL2 weights). I would like to contribute, but I'm not too sure how to convert the weights from JAX to PyTorch.
I had a dirty dirty script which unfortunately lives on my old dev machine that I don't have with me at the moment 😅
I basically just loaded the t5 weights and went through and renamed every thing to match the HF format.
Hey! Thanks for opening, they will be available on the hub soon! We are converting them with @younesbelkada
The model is already out! (https://huggingface.co/google/flan-ul2) @younesbelkada has a space comparing Flan-T5-XXL and Flan-UL2 here: https://huggingface.co/spaces/ybelkada/i-like-flan-ul2
Model description
UL2 is a unified framework for pretraining models that are universally effective across datasets and setups. UL2 uses Mixture-of-Denoisers (MoD), a pre-training objective that combines diverse pre-training paradigms together. UL2 introduces a notion of mode switching, wherein downstream fine-tuning is associated with specific pre-training schemes.
FLAN-UL2 has the same configuration as the original UL2 20B model, except that it has been instruction tuned with Flan.
Open source status
Provide useful links for the implementation
The model architecture (UL2) is already in Huggingface Transformers. The 20B model weights are here: https://github.com/google-research/google-research/tree/master/ul2#checkpoints