Closed NielsRogge closed 2 years ago
cc @stefan-it @peregilk @agemagician @stancld @edugp FYI might be interesting for you as well :-)
Is anyone working on porting UL2 to transformers
already? If not, I am interested in porting it.
Hey @manuelciosici - think @kamalkraj is working on it . Maybe you guys can sync on how to collaborate? :-)
Happy to help in any way!
Hi @manuelciosici,
I am trying to understand the t5x library and loading the model.
We can work together. You can ping me on slack/discord
@manuelciosici and @kamalkraj. I am about to start some UL2 training in t5x. I might also contribute here.
Hello @kamalkraj, regarding the t5x
library (loading model, etc.), I've done some inference with LongT5
model in my repo here.
Thank you so much @stancld
@manuelciosici and @kamalkraj. I am about to start some UL2 training in t5x. I might also contribute here.
Hi @manuelciosici , Did you start fine-tuning? Did you identify the t5 gin file required for it.
They have only released ul2
gin file. Not the full set.
https://github.com/google-research/google-research/issues/1101
@kamalkraj Unfortunately, I was handed a tight deadline, so I won't be able to look into UL2 until July.
no worries! Anybody interested in taking over the UL2 implementation ? Would be happy to help :-)
I can take a stab at this in the next week if no one else is actively working on it!
I'll hopefully open a PR soon - help is welcome from anyone who would like as well :)
I've had the model running locally for a while but didn't get around to pushing it to the hub until now 😅
with #17420, merged into master the architecture is already supported (in 4.20).
I've put the weights here for now: https://huggingface.co/Seledorn/ul2
I think what remains is mostly verifying that we get identical output with the port and the original model. But this is as @kamalkraj noted a bit difficult without the complete gin files. Though the model does give me reasonable outputs, so I believe the conversion is at least mostly correct.
That's amazing @DanielHesslow - I'll check them out this week!
Great job on porting the model!
Google released weights for 3 UL2 checkpoints. I'm assuming the model in HuggingFace corresponds to the last checkpoint, but just to make sure, that is correct right?
Yes that's true as I know! cc @DanielHesslow just to be sure as he has ported the checkpoint :-)
Yeah it's the latest one
Hello :hand: are you aware of any implementation of the Mixture-of-Denoisers loss? preferably with HF compatibility. Thanks in any case!
We haven't added this one yet - would you like to open a feature request / PR for it maybe? :-)
Model description
UL2 is a unified framework for pretraining models that are universally effective across datasets and setups. UL2 uses Mixture-of-Denoisers (MoD), a pre-training objective that combines diverse pre-training paradigms together. UL2 introduces a notion of mode switching, wherein downstream fine-tuning is associated with specific pre-training schemes.
Open source status
Provide useful links for the implementation
Code and weights (20 billion parameter models): https://github.com/google-research/google-research/tree/master/ul2 The code is based on T5x (which is JAX/FLAX): https://github.com/google-research/t5x