Support `google/flan-t5-*`

nlpodyssey / cybertron

Cybertron: the home planet of the Transformers in Go

BSD 2-Clause "Simplified" License

289 stars 26 forks source link

Support `google/flan-t5-*` #20

Open mooijtech opened 1 year ago

mooijtech commented 1 year ago

I would like to use the following model(s): https://huggingface.co/google/flan-t5-small https://huggingface.co/google/flan-t5-xxl

What would be required to add support if I were to look at contributing myself?

Kind regards, Marten

mooijtech commented 1 year ago

Not sure if T5 is compatible with BART, hopefully it is since both are encoder-decoder. Seems to be some config.json differences, trying to modify it now.

mooijtech commented 1 year ago

Stuck on input encoding embeddings.

mooijtech commented 1 year ago

T5 uses an encoder-decoder architecture that closely resembles the original transformer. The differences are:

    LayerNorm is applied immediately before each attention and feed forward transformation (i.e., outside of the residual path)

    No additive bias is used for LayerNorm (i.e., see here; we only use scale and eliminate the additive bias)

    A simple position embedding scheme is used that adds a scalar to the corresponding logit used to compute attention weights

    Dropout is applied throughout the network (e.g., attention weights, feed forward network, skip connection, etc.)

3f25f11e-1daf-4711-940a-6b09a1f62ae7_2298x1474

matteo-grella commented 1 year ago

@mooijtech I am ready to work on this together, let me know if you're still interested :)

mooijtech commented 1 year ago

Hello Matteo,

I have lost access to my GitHub account due to the great new 2FA requirement (replying via email should still work I guess).

I am not currently interested as I've switched direction, if I really wanted to do this I would have done it by now as if I want to do something there's nothing that can stop me as the internet knows everything :)

Kind regards, Marten