ucinlp / autoprompt

AutoPrompt: Automatic Prompt Construction for Masked Language Models.
Apache License 2.0
594 stars 81 forks source link

How to run the code --queries #36

Closed SageAgastya closed 2 years ago

SageAgastya commented 3 years ago

@rloganiv @taylorshin I am using this model to generate prompts for sentiment analysis on imdb. I ran this code to generate trigger tokens : python -m autoprompt.create_trigger \ --train glue_data/SST-2/train.tsv \ --dev glue_data/SST-2/dev.tsv \ --template '<s> {sentence} [T] [T] [T] [P] . </s>' \ --label-map '{"0": ["Ġworse", "Ġincompetence", "ĠWorse", "Ġblamed", "Ġsucked"], "1": ["ĠCris", "Ġmarvelous", "Ġphilanthrop", "Ġvisionary", "Ġwonderful"]}' \ --num-cand 100 \ --accumulation-steps 30 \ --bsz 24 \ --eval-size 48 \ --iters 180 \ --model-name roberta-large

My 1st question is that the trigger words generated will be same for all imdb reviews?

Secondly, I also ran code to generate the labels (as you suggested in .Readme), so now I have trigger tokens and labels generated by your model. I want to know what is the next step? How to prompt the LM to generate labels?
Also, is that the set of labels generated by your model same for all imdb reviews in dataset?

In the command for generating labels (which you wrote in readme), should I replace [T] the trigger tokens with the trigger tokens generated by. model?

I urgently need these answers. I request authors to look into these queries.
Thanks.
rloganiv commented 3 years ago

Hi @SageAgastya,

  1. The trigger and label tokens are the same for all instances.
  2. When generating the labels you can either use trigger tokens or manually write a prompt. There is kind of a chicken-and-egg situation where you either need to find label tokens first with an prompt, or a prompt first with arbitrary label tokens.
  3. For evaluation, we just initialized the prompt with the AutoPrompt returned at the end of training, then substituted the dev set with the test set and set --iters=0.

Best,

@rloganiv