Open mooijtech opened 1 year ago
Not sure if T5 is compatible with BART, hopefully it is since both are encoder-decoder. Seems to be some config.json
differences, trying to modify it now.
Stuck on input encoding embeddings.
T5 uses an encoder-decoder architecture that closely resembles the original transformer. The differences are:
LayerNorm is applied immediately before each attention and feed forward transformation (i.e., outside of the residual path)
No additive bias is used for LayerNorm (i.e., see here; we only use scale and eliminate the additive bias)
A simple position embedding scheme is used that adds a scalar to the corresponding logit used to compute attention weights
Dropout is applied throughout the network (e.g., attention weights, feed forward network, skip connection, etc.)
@mooijtech I am ready to work on this together, let me know if you're still interested :)
Hello Matteo,
I have lost access to my GitHub account due to the great new 2FA requirement (replying via email should still work I guess).
I am not currently interested as I've switched direction, if I really wanted to do this I would have done it by now as if I want to do something there's nothing that can stop me as the internet knows everything :)
Kind regards, Marten
I would like to use the following model(s): https://huggingface.co/google/flan-t5-small https://huggingface.co/google/flan-t5-xxl
What would be required to add support if I were to look at contributing myself?
Kind regards, Marten