I wonder if you guys tried to simply average the embeddings for the title to initialize the continuous prompts. I feel that it can be a simpler solution to the problem you guys mentioned in paper. I want to try something with the ideas in the paper but two stage training kind of scares me away(I only have limited time for a project.)
Do you guys think "average the embeddings for the title to initialize the continuous prompts" is a valid idea?
You can give it a shot. I think titles that consist of words are more compatible with the model than random embeddings. This could make the model converge faster and shorten the training time.
I wonder if you guys tried to simply average the embeddings for the title to initialize the continuous prompts. I feel that it can be a simpler solution to the problem you guys mentioned in paper. I want to try something with the ideas in the paper but two stage training kind of scares me away(I only have limited time for a project.) Do you guys think "average the embeddings for the title to initialize the continuous prompts" is a valid idea?