Add UL2: Unifying Language Learning Paradigms

huggingface / transformers

🤗 Transformers: State-of-the-art Machine Learning for Pytorch, TensorFlow, and JAX.

https://huggingface.co/transformers

Apache License 2.0

133.31k stars 26.63k forks source link

Add UL2: Unifying Language Learning Paradigms #17207

Closed NielsRogge closed 2 years ago

NielsRogge commented 2 years ago

Model description

UL2 is a unified framework for pretraining models that are universally effective across datasets and setups. UL2 uses Mixture-of-Denoisers (MoD), a pre-training objective that combines diverse pre-training paradigms together. UL2 introduces a notion of mode switching, wherein downstream fine-tuning is associated with specific pre-training schemes.

Open source status

[X] The model implementation is available
[X] The model weights are available

Provide useful links for the implementation

Code and weights (20 billion parameter models): https://github.com/google-research/google-research/tree/master/ul2 The code is based on T5x (which is JAX/FLAX): https://github.com/google-research/t5x

patrickvonplaten commented 2 years ago

cc @stefan-it @peregilk @agemagician @stancld @edugp FYI might be interesting for you as well :-)

manuelciosici commented 2 years ago

Is anyone working on porting UL2 to transformers already? If not, I am interested in porting it.

patrickvonplaten commented 2 years ago

Hey @manuelciosici - think @kamalkraj is working on it . Maybe you guys can sync on how to collaborate? :-)

Happy to help in any way!

kamalkraj commented 2 years ago

Hi @manuelciosici,

I am trying to understand the t5x library and loading the model.

We can work together. You can ping me on slack/discord

peregilk commented 2 years ago

@manuelciosici and @kamalkraj. I am about to start some UL2 training in t5x. I might also contribute here.

stancld commented 2 years ago

Hello @kamalkraj, regarding the t5x library (loading model, etc.), I've done some inference with LongT5 model in my repo here.

kamalkraj commented 2 years ago

Thank you so much @stancld

kamalkraj commented 2 years ago

@manuelciosici and @kamalkraj. I am about to start some UL2 training in t5x. I might also contribute here.

Hi @manuelciosici , Did you start fine-tuning? Did you identify the t5 gin file required for it.

They have only released ul2 gin file. Not the full set. https://github.com/google-research/google-research/issues/1101

manuelciosici commented 2 years ago

@kamalkraj Unfortunately, I was handed a tight deadline, so I won't be able to look into UL2 until July.

patrickvonplaten commented 2 years ago

no worries! Anybody interested in taking over the UL2 implementation ? Would be happy to help :-)

haileyschoelkopf commented 2 years ago

I can take a stab at this in the next week if no one else is actively working on it!

I'll hopefully open a PR soon - help is welcome from anyone who would like as well :)

DanielHesslow commented 2 years ago

I've had the model running locally for a while but didn't get around to pushing it to the hub until now 😅

with #17420, merged into master the architecture is already supported (in 4.20).

I've put the weights here for now: https://huggingface.co/Seledorn/ul2

I think what remains is mostly verifying that we get identical output with the port and the original model. But this is as @kamalkraj noted a bit difficult without the complete gin files. Though the model does give me reasonable outputs, so I believe the conversion is at least mostly correct.

patrickvonplaten commented 2 years ago

That's amazing @DanielHesslow - I'll check them out this week!

patrickvonplaten commented 2 years ago

Great job on porting the model!

Sea-Snell commented 2 years ago

Google released weights for 3 UL2 checkpoints. I'm assuming the model in HuggingFace corresponds to the last checkpoint, but just to make sure, that is correct right?

patrickvonplaten commented 2 years ago

Yes that's true as I know! cc @DanielHesslow just to be sure as he has ported the checkpoint :-)

DanielHesslow commented 2 years ago

Yeah it's the latest one

gaceladri commented 1 year ago

Hello :hand: are you aware of any implementation of the Mixture-of-Denoisers loss? preferably with HF compatibility. Thanks in any case!

patrickvonplaten commented 1 year ago

We haven't added this one yet - would you like to open a feature request / PR for it maybe? :-)