classification with gpt & training time

MatthewCYM commented 1 year ago

Thanks for the quick reply. I run the code with

python run_fsc.py \
    dataset=agnews \
    dataset_seed=0 \
    prompt_length=5 \
    task_lm=roberta-large \
    random_seed=42 \
    report_to_wandb=false

The training takes around 1 day to complete on a single RTX3090, which is much longer than the training time reported in the paper (4 hr). May I ask if this is normal?

I also try to run the code with gpt2 backbone:

python run_fsc.py \
    dataset=agnews \
    dataset_seed=0 \
    prompt_length=5 \
    task_lm=gpt2-xl \
    random_seed=42 \
    report_to_wandb=false

The eval accuracy is only 62.5. Have you experimented with GPT2 on the classification task?

MM-IR commented 1 year ago

Hi,

For the first question, I am not sure about how you evaluate the time here.

There are multiple factors affecting system performance, such as number of processes, your cpu/gpu computing power. More importantly, your way of picking prompts is also another factor, such as the number of steps. In terms of us, we pick prompts when the reward does not have any significant improvements, this may reduce some computing time as well when averaging over random seeds.

For the question 2, I suggest you to take some time to peek into our code implementation, that is, current GPT-2 prompt format for agnews task uses Roberta template, including special symbol, like mask token… I have tried GPT-2 models on that task with normal sst-2 like autoregressive prompt. The performance shall be similar to Roberta.

MM-IR commented 1 year ago

Since it is a clarification, I am closing this now.

mingkaid / rl-prompt

classification with gpt & training time #27