timoschick / pet

This repository contains the code for "Exploiting Cloze Questions for Few-Shot Text Classification and Natural Language Inference"
https://arxiv.org/abs/2001.07676
Apache License 2.0
1.62k stars 282 forks source link

Example of usage #4

Closed summerstay closed 3 years ago

summerstay commented 3 years ago

Would you mind sharing exactly what to do to download the training data and do training and evaluation for one example? I'm having trouble figuring out exactly how things should be arranged and what to put for the command line parameters. Any example would do, as long as I can just follow the instructions.

timoschick commented 3 years ago

Hi @summerstay, I am currently on vacation but I'll do my best to share a detailed description early next week.

timoschick commented 3 years ago

Hi @summerstay, here's an example with two different tasks: RTE from SuperGLUE/FewGLUE and AG's News. Please let me know if this is sufficient or if you need additional details/instructions :)

Step 1: Requirements

1) Download the latest version of PET: git clone https://github.com/timoschick/pet.git 2) Go to the PET directory: cd pet 3) Create and activate a new virtual environment: python3 -m venv /venv and source /venv/bin/activate 4) Install all requirements: pip install -r requirements.txt

Step 2: Download Task-specific Data

RTE

1) Go to https://github.com/timoschick/fewglue to download FewGLUE/RTE/train.jsonl and FewGLUE/RTE/unlabeled.jsonl. 2) Go to https://dl.fbaipublicfiles.com/glue/superglue/data/v2/RTE.zip to download the original RTE data. 3) Create a new folder (let's call it rte-data) where you place train.jsonl and unlabeled.jsonl from FewGLUE as well as val.jsonl and test.jsonl from the original SuperGLUE data.

AG's News

1) Go to http://goo.gl/JyCnZq, download and extract ag_news_csv.tar.gz in a new folder (let's call it agnews-data). This folder should contain the following two files: train.csv and test.csv.

Step 3: Run PET (or iPET)

The following assumes that your folder structure looks something like this and that your current working directory is pet:

 pet/
 ├── cli.py
 └── ...
 rte-data/
 ├── train.jsonl
 ├── unlabeled.jsonl
 ├── test.jsonl
 └── val.jsonl
 agnews-data/
 ├── train.csv
 └── test.csv

If you have multiple GPUs, you may want to select just one in order to reproduce our exact results. This can be done with export CUDA_VISIBLE_DEVICES=<ID> where <ID> is the id of the GPU to be used.

RTE

For RTE with ALBERT, run the following command:

 python3 cli.py \
--method pet \
--pattern_ids 0 1 2 3 \
--data_dir ../rte-data \
--model_type albert \
--model_name_or_path albert-xxlarge-v2 \
--task_name rte \
--output_dir ../rte-output \
--do_train \
--do_eval \
--pet_per_gpu_train_batch_size 2 \
--pet_gradient_accumulation_steps 8 \
--pet_max_steps 250 \
--sc_per_gpu_unlabeled_batch_size 2 \
--sc_gradient_accumulation_steps 8 \
--sc_max_steps 5000

If you want to use iPET instead of PET, simply replace --method pet with --method ipet. If your GPU has more memory available, you may try increasing --pet_per_gpu_train_batch_size and --sc_per_gpu_unlabeled_batch_size while decreasing --pet_gradient_accumulation_steps and --sc_gradient_accumulation_steps to speed up training.

This will take several hours to run. First, this script will generate a folder p0-i0 in rte-output which will, after some time, contain the first iteration (i0) of the pretrained model corresponding to pattern 0 (p0). The script will then generate folders p0-i1, p0-i2, p1-i0, ... until p3-i2.

Finally, the script will create a folder rte-output/final/p0-i0 containing the final distilled model. The content of this directory (including the model's predictions for the test set) are explained in this paragraph.

AG's News

For AG's News, let's try RoBERTa instead of ALBERT. As RoBERTa requires less memory, we can use "Auxiliary Language Modeling" as additional training objective to improve performance (see here); this is achieved by adding the flag --lm_training and specifiying a --pet_per_gpu_unlabeled_batch_size. Also, note that there is no "official" few-shot dataset for AG's News. The train.csv file we have downloaded earlier contains the entire train set with thousands of examples. However, cli.py provides the --train_examples argument with which we can artificially downsample the training dataset to contain only a specific number of examples. We can also specifiy --split_examples_evenly so that the downsampled dataset contains roughly the same number of examples for each label. Let's try it with a total of 10 training examples (i.e., 2-3 examples per label) and 40.000 unlabeled examples (which are also taken from train.csv by default for this task):

 python3 cli.py \
--method pet \
--pattern_ids 0 1 2 3 4 5 \
--data_dir ../agnews-data \
--model_type roberta \
--model_name_or_path roberta-large \
--task_name agnews \
--output_dir agnews-output \
--do_train \
--do_eval \
--train_examples 10 \
--unlabeled_examples 40000 \
--split_examples_evenly \
--pet_per_gpu_train_batch_size 1 \
--pet_per_gpu_unlabeled_batch_size 3 \
--pet_gradient_accumulation_steps 4 \
--pet_max_steps 250 \
--lm_training \
--sc_per_gpu_train_batch_size 4 \
--sc_per_gpu_unlabeled_batch_size 4 \
--sc_gradient_accumulation_steps 4 \
--sc_max_steps 5000

The training process and output format will be exactly the same as for RTE (see above).

summerstay commented 3 years ago

Thank you! This worked for me.

santhoshkolloju commented 3 years ago

Hi Thanks for providing the example. Section 3.5 talks about "Automatic Verbalizer Search". Can you point me to file which contains code for this implementation .

Thanks

timoschick commented 3 years ago

Hi @santhoshkolloju, Automatic Verbalizer Search (AVS) is currently not included in this repo. We have a separate paper focussing exclusively on AVS that will be published to arXiv somewhen this month; the code for AVS will be released along with this paper.

Thierryonre commented 3 years ago

So is there a way you could produce a generative chatbot using this code? I understand that its generative text abilities aren't as good as GPT-3 but it's still worth giving a try. Furthermore, could you demonstrate how to use the model after it has been been produced? Documentation for absolute beginners might be needed for people like me XD

timoschick commented 3 years ago

Hi @Thierryonre, we haven't tried PET in a generative fashion (for example, to produce a chatbot) and I doubt that the current version is capable of doing so if the text to be generated is longer than, say, 5-10 tokens. We are currently investigating whether PET can also be made to work in a generative setting, but this is probably going to take a while.

Furthermore, could you demonstrate how to use the model after it has been been produced?

Sure, I'll add some additional documentation to the repository as soon as I find the time. In the meantime, maybe these few lines of code that show basic usage of a trained model can help:

import torch
import numpy as np

from pet import InputExample
from pet.tasks import PROCESSORS
from pet.wrapper import TransformerModelWrapper

device = 'cuda' if torch.cuda.is_available() else 'cpu'
processor = PROCESSORS[TASK_NAME]()
wrapper = TransformerModelWrapper.from_pretrained(MODEL_DIR)
wrapper.model.to(device)

eval_data = [
    InputExample(
        idx=1,
        guid="eval-1",
        text_a="Premier League & Champions League build-up",
        text_b="Chelsea manager Lampard speaks to the media before Champions League match against Sevilla",
        label="2"
    )
]

# this returns a dictionary containing the following (for each example in eval_data):
# results['indices'][i] <- the index of the i-th example
# results['logits'][i] <- the per-class logits of the i-th example (for k classes, this is a list of k values)
# results['labels'][i] <- the internal index of the i-th example's actual label (not the predicted label)
results = wrapper.eval(eval_data, device)

# to get the model's predictions (note that this gives the class indices and not the class names):
predictions = np.argmax(results['logits'], axis=1)

# to convert the class indices to class names
class_idx_to_class_name = {idx: name for idx, name in enumerate(processor.get_labels())}
predictions = [class_idx_to_class_name[prediction] for prediction in predictions]