timoschick / pet

This repository contains the code for "Exploiting Cloze Questions for Few-Shot Text Classification and Natural Language Inference"
https://arxiv.org/abs/2001.07676
Apache License 2.0
1.62k stars 282 forks source link

train ipet with zero training examples #66

Closed EneruMin closed 2 years ago

EneruMin commented 2 years ago

Hi, I am training ipet with zero training examples, I run the following command. python3 cli.py --method ipet --pattern_ids 0 1 2 3 4 --data_dir /share/home/zqzeng/wmni/data/ag_news_csv/ag_news_csv --model_type roberta --model_name_or_path /share/home/zqzeng/transformers/roberta-large --task_name agnews --output_dir /share/home/zqzeng/wmni/data/output/unsupervised-ipet --do_train --do_eval --pet_repetitions 1 --ipet_n_most_likely 100 --reduction mean --train_examples 0 And I got the following result: 2021-11-09 20:22:31,904 - INFO - tasks - Creating features from dataset file at ag_news_csv/ (num_examples=0, set_type=train) 2021-11-09 20:22:34,978 - INFO - tasks - Returning 120000 train examples with label dist.: [('3', 30000), ('4', 30000), ('2', 30000), ('1', 30000)] I followed the flow of the program and found that the whole train examples(120000) was uesd to train each individual model. When I used "--train_examples 10", it's normal, as shown below: 2021-11-09 20:19:13,402 - INFO - tasks - Creating features from dataset file at ag_news_csv/ (num_examples=10, set_type=train) 2021-11-09 20:19:16,127 - INFO - tasks - Returning 10 train examples with label dist.: [('1', 3), ('4', 4), ('2', 2), ('3', 1)] Does the zero training examples don't work? I would be grateful for your prompt reply.

EneruMin commented 2 years ago

The above problem is caused by the following method in task.py.

def _shuffle_and_restrict(examples: List[InputExample], num_examples: int, seed: int = 42) -> List[InputExample]:
    """
    Shuffle a list of examples and restrict it to a given maximum size.

    :param examples: the examples to shuffle and restrict
    :param num_examples: the maximum number of examples
    :param seed: the random seed for shuffling
    :return: the first ``num_examples`` elements of the shuffled list
    """
    if 0 < num_examples < len(examples):
        random.Random(seed).shuffle(examples)
        examples = examples[:num_examples]
    return examples

When the num_examples equals 0, this method will return all the training examples, which next would be used to fine tune the language model. So I wonder how you use ipet with zero training examples.

timoschick commented 2 years ago

Hi @EneruMin, you are absolutely correct, this is an error in the code as it should be if 0 <= num_examples < len(examples). I'll update the code accordingly.

The reason why things still worked in our experiments is that we additionally specified the --split_examples_evenly option. Without this error, this shouldn't have any effect in the zero-shot setting (as it basically just tells the script to choose training examples so that there is the same number of examples for each label) but it causes the script to not use _shuffle_and_restrict and instead use a LimitedExampleList, which handles the case of 0 examples correctly:

if num_examples is not None:
    examples = _shuffle_and_restrict(examples, num_examples, seed)

elif num_examples_per_label is not None:
    limited_examples = LimitedExampleList(processor.get_labels(), num_examples_per_label)
    for example in examples:
        limited_examples.add(example)
    examples = limited_examples.to_list()

So to fix this issue, you can either (a) replace 0 < num_examples [...] with 0 <= num_examples [...] (or just wait for the next update), (b) specify the --split_examples_evenly option or (c) simply change your TaskProcessor so that it doesn't return any training examples in the first place.

EneruMin commented 2 years ago

Hi @timoschick , thanks for your suggestion. I specified the --split_examples_evenly option. This time it returned zero training example. But another error occurred.

Traceback (most recent call last):
  File "cli.py", line 283, in <module>
    main()
  File "cli.py", line 271, in main
    eval_data=eval_data, do_train=args.do_train, do_eval=args.do_eval, seed=args.seed)
  File "/share/home/zqzeng/wmni/pet-master-edited/pet-master/pet/modeling.py", line 216, in train_ipet
    eval_data=eval_data, do_train=do_train, do_eval=do_eval)
  File "/share/home/zqzeng/wmni/pet-master-edited/pet-master/pet/modeling.py", line 298, in train_classifier
    do_eval=do_eval, seed=seed)
  File "/share/home/zqzeng/wmni/pet-master-edited/pet-master/pet/modeling.py", line 358, in train_pet_ensemble
    unlabeled_data=unlabeled_data))
  File "/share/home/zqzeng/wmni/pet-master-edited/pet-master/pet/modeling.py", line 461, in train_single_model
    temperature=config.temperature
  File "/share/home/zqzeng/wmni/pet-master-edited/pet-master/pet/wrapper.py", line 229, in train
    train_sampler = RandomSampler(train_dataset)
  File "/share/home/zqzeng/anaconda3/envs/wmni/lib/python3.7/site-packages/torch/utils/data/sampler.py", line 94, in __init__
    "value, but got num_samples={}".format(self.num_samples))
ValueError: num_samples should be a positive integer value, but got num_samples=0

In the train() method, the task_train_data is none, so the function RandomSampler(train_dataset) return an error. I solved this problem by checking whether the task_train_data is none in the code. But the final accuracy in my result is 0.823, while the accuracy in your paper is 0.875 (on AGNews task). Is it reasonable?

timoschick commented 2 years ago

Interesting... I'll check why we didn't get a similar error as soon as I find the time. Regardless, the final accuracy should be much better than the one you've reported. There are a couple of differences between your command and the one that we have used, so I cannot tell what exactly causes the difference. Could you tell me the results after each iteration (the contents of the result_test.txt file in each iteration's directory)? That will help to identify the point where things diverge.

A couple of notes regarding possible differences:

Finally, you may get different results due to random selection of examples and model initialization (but those should not account for more than 5% difference in performance). If you want to reproduce our exact results and none of the above helps, you can check out the v1.1.0 branch that contains the script that we have used for iPET.

EneruMin commented 2 years ago

The results of each iteration are as shown below. g0

acc-p0: 0.6531578947368422 +- 0
acc-p1: 0.7471052631578947 +- 0
acc-p2: 0.5906578947368422 +- 0
acc-p3: 0.7082894736842106 +- 0
acc-p4: 0.7942105263157895 +- 0
acc-all-p: 0.6986842105263158 +- 0.07953688886139995

g1

acc-p0: 0.7794736842105263 +- 0
acc-p1: 0.7235526315789473 +- 0
acc-p2: 0.6317105263157895 +- 0
acc-p3: 0.7476315789473684 +- 0
acc-p4: 0.7796052631578947 +- 0
acc-all-p: 0.7323947368421052 +- 0.06101826675772052

g2

acc-p0: 0.8057894736842105 +- 0
acc-p1: 0.7156578947368422 +- 0
acc-p2: 0.7773684210526316 +- 0
acc-p3: 0.8132894736842106 +- 0
acc-p4: 0.7723684210526316 +- 0
acc-all-p: 0.7768947368421053 +- 0.03850371874690442

final

acc-p0: 0.8228947368421052 +- 0
acc-all-p: 0.8228947368421052 +- 0

According to the figure 4 in your paper, I think maybe I should use 4 or 5 iterations.