lopuhin / transformer-lm

Transformer language model (GPT-2) with sentencepiece tokenizer
164 stars 47 forks source link

How to prepare the data for text generation task. Thank you very much. #1

Closed guotong1988 closed 5 years ago

guotong1988 commented 5 years ago

First, I'm not sure whether the model contains the encoder during training.

EOS means end-of-sentence.

If without-encoder, training time:

target: [E, F, G, H, EOS]
decoder input: [0, E, F, G, H]

If without-encoder, testing time:

decoder input: [0]

If with encoder, training time:

encoder input: [A, B, C, D]
target: [E, F, G, H, EOS]
decoder input: [0, E, F, G, H]

If with-encoder, testing time:

encoder input: [A, B, C, D]
decoder input: [0]

Am I exact right?

I know it is beyond the topic of this project, but hope you could help. Thank you and thank you.

lopuhin commented 5 years ago

Hi @guotong1988 sorry I'm completely lost, could you please provide more context? What is "with-encoder"/"without-encoder"? What is decoder here? If this is a general question not related to this project, then maybe stackoverflow would be a better place.

guotong1988 commented 5 years ago

Thank you for your reply.

With-encoder means we use transformer-encoder-decoder to do the language model(LM) task for text generation.

Without-encoder means that only transformer-decoder to do the language model(LM) task for text generation.

I just come up with this question but no people answer in stackoverflow.com

lopuhin commented 5 years ago

@guotong1988 thanks, could you also clarify which encoder and decoder do you have in mind? Asking because they can mean different things, e.g. one view is to treat BPE tokenizer as encoder and decoder, and another to treat some part of transformer network as encoder/decoder (e.g. this terminology is used in machine translation).

guotong1988 commented 5 years ago

I mean, 'treat some part of transformer network'. Thank you.

guotong1988 commented 5 years ago

Thank you for your reply. I will be strict to the definition for the encoder and decoder later.

lopuhin commented 5 years ago

Thanks for clarification @guotong1988 . Regarding your original question, I think I understand what you mean by "without-encoder", this looks similar to a language modelling case, and I think you are correct. I'm afraid I won't be able to help with "with-encoder" case though.