Is there a bidirectional- gpt model available?

karpathy / nanoGPT

The simplest, fastest repository for training/finetuning medium-sized GPTs.

MIT License

37.21k stars 5.92k forks source link

Is there a bidirectional- gpt model available? #299

Open win10ogod opened 1 year ago

win10ogod commented 1 year ago

Does anyone have an implementation of a bidirectional gpt model? Like bi-lstm.

win10ogod commented 1 year ago

Please send me the implementation if you have one.

stevenwyy commented 9 months ago

If you're for a generative gpt, it's not possible to have a bidirectional model, since bidirectional model needs a complete sentence at the start. Or, what you're looking for is a categorizing model, the encoder block in transformers architecture is way more efficient than a bidirectional LSTM in that every single token in the sequence can talk to each other.

win10ogod commented 9 months ago

如果您使用生成式 gpt，則不可能有雙向模型，因為雙向模型在開始時需要一個完整的句子。或者，您正在尋找的是一個分類模型，Transformer 架構中的編碼器區塊比雙向 LSTM 更有效率，因為序列中的每個標記都可以相互通訊。

I need a model architecture like BatGPT for experimentation. https://arxiv.org/abs/2307.00360 https://huggingface.co/MLP-lab/BatGPT-15B-sirius/tree/main

PLarsen79 commented 8 months ago

Create two special tokens, one for forward and one for backward. Always add one of them as the first token on the prompt, during training and sampling.

When training: if backward special token is the first token, then reverse the text before adding it to the prompt and targets. if forward special token is the first token, then add normal text to the prompt and target, as usual.

When sampling: If you want to sample a token "before" some text, just put the backward special token first and then the reverse of the text as prompt. The model will return the token that can be added before the generated text.

This way it's possible to generate and extend text in both directions from a starting prompt. It's a matter of pre-processing the training data and post-processing the sampled output.

michelle-w-br commented 8 months ago

Training forward model or backward model separately is straight forward. The part wasn't clear is that in the BatGPT paper they claim to minimize "the sum of forward loss and backward loss", not quite sure how such forward+backward loss can be minimized in one unidirectional generative model. If they use Transformer encoder which is bi-directional to compute the loss, do they train with the masked language model objective, but they didn't mention this in the paper either