Some question about the "LM Adaptation"

kipgparker / soft-prompt-tuning

MIT License

339 stars 44 forks source link

Hello!

Sorry to bother you. After reading this great work, I have a question about the "LM Adaptation" setting in the paper. In my opinion, this adaptation is used for decoder-based model architecture. How can we use it for encoder-decoder-based model? And do you use the same max sequence length 512 and batch size 128 as the original T5 paper? In addition, as the sentence length in C4 is usually less than the max sequence length, do you combine several different sentences to form a longer sentence whose length is the max sequence length, then divide it into input and target?

Hope that you can give me some advice. Thanks in advance for any help you can give.

kipgparker / soft-prompt-tuning

Some question about the "LM Adaptation" #1