Closed EneruMin closed 2 years ago
The above problem is caused by the following method in task.py.
def _shuffle_and_restrict(examples: List[InputExample], num_examples: int, seed: int = 42) -> List[InputExample]:
"""
Shuffle a list of examples and restrict it to a given maximum size.
:param examples: the examples to shuffle and restrict
:param num_examples: the maximum number of examples
:param seed: the random seed for shuffling
:return: the first ``num_examples`` elements of the shuffled list
"""
if 0 < num_examples < len(examples):
random.Random(seed).shuffle(examples)
examples = examples[:num_examples]
return examples
When the num_examples equals 0, this method will return all the training examples, which next would be used to fine tune the language model. So I wonder how you use ipet with zero training examples.
Hi @EneruMin, you are absolutely correct, this is an error in the code as it should be if 0 <= num_examples < len(examples)
. I'll update the code accordingly.
The reason why things still worked in our experiments is that we additionally specified the --split_examples_evenly
option. Without this error, this shouldn't have any effect in the zero-shot setting (as it basically just tells the script to choose training examples so that there is the same number of examples for each label) but it causes the script to not use _shuffle_and_restrict
and instead use a LimitedExampleList
, which handles the case of 0 examples correctly:
if num_examples is not None:
examples = _shuffle_and_restrict(examples, num_examples, seed)
elif num_examples_per_label is not None:
limited_examples = LimitedExampleList(processor.get_labels(), num_examples_per_label)
for example in examples:
limited_examples.add(example)
examples = limited_examples.to_list()
So to fix this issue, you can either (a) replace 0 < num_examples [...]
with 0 <= num_examples [...]
(or just wait for the next update), (b) specify the --split_examples_evenly
option or (c) simply change your TaskProcessor
so that it doesn't return any training examples in the first place.
Hi @timoschick , thanks for your suggestion.
I specified the --split_examples_evenly
option. This time it returned zero training example. But another error occurred.
Traceback (most recent call last):
File "cli.py", line 283, in <module>
main()
File "cli.py", line 271, in main
eval_data=eval_data, do_train=args.do_train, do_eval=args.do_eval, seed=args.seed)
File "/share/home/zqzeng/wmni/pet-master-edited/pet-master/pet/modeling.py", line 216, in train_ipet
eval_data=eval_data, do_train=do_train, do_eval=do_eval)
File "/share/home/zqzeng/wmni/pet-master-edited/pet-master/pet/modeling.py", line 298, in train_classifier
do_eval=do_eval, seed=seed)
File "/share/home/zqzeng/wmni/pet-master-edited/pet-master/pet/modeling.py", line 358, in train_pet_ensemble
unlabeled_data=unlabeled_data))
File "/share/home/zqzeng/wmni/pet-master-edited/pet-master/pet/modeling.py", line 461, in train_single_model
temperature=config.temperature
File "/share/home/zqzeng/wmni/pet-master-edited/pet-master/pet/wrapper.py", line 229, in train
train_sampler = RandomSampler(train_dataset)
File "/share/home/zqzeng/anaconda3/envs/wmni/lib/python3.7/site-packages/torch/utils/data/sampler.py", line 94, in __init__
"value, but got num_samples={}".format(self.num_samples))
ValueError: num_samples should be a positive integer value, but got num_samples=0
In the train()
method, the task_train_data
is none, so the function RandomSampler(train_dataset)
return an error.
I solved this problem by checking whether the task_train_data
is none in the code.
But the final accuracy in my result is 0.823, while the accuracy in your paper is 0.875 (on AGNews task). Is it reasonable?
Interesting... I'll check why we didn't get a similar error as soon as I find the time. Regardless, the final accuracy should be much better than the one you've reported. There are a couple of differences between your command and the one that we have used, so I cannot tell what exactly causes the difference. Could you tell me the results after each iteration (the contents of the result_test.txt
file in each iteration's directory)? That will help to identify the point where things diverge.
A couple of notes regarding possible differences:
--pet_repetitions 3
, whereas you've been using --pet_repetitions 1
. Using more models stabilizes results.--lm_training
(with a ratio of 1:3 training examples and unlabeled examples).--ipet_generations
). You can check that in our paper.Finally, you may get different results due to random selection of examples and model initialization (but those should not account for more than 5% difference in performance). If you want to reproduce our exact results and none of the above helps, you can check out the v1.1.0
branch that contains the script that we have used for iPET.
The results of each iteration are as shown below. g0
acc-p0: 0.6531578947368422 +- 0
acc-p1: 0.7471052631578947 +- 0
acc-p2: 0.5906578947368422 +- 0
acc-p3: 0.7082894736842106 +- 0
acc-p4: 0.7942105263157895 +- 0
acc-all-p: 0.6986842105263158 +- 0.07953688886139995
g1
acc-p0: 0.7794736842105263 +- 0
acc-p1: 0.7235526315789473 +- 0
acc-p2: 0.6317105263157895 +- 0
acc-p3: 0.7476315789473684 +- 0
acc-p4: 0.7796052631578947 +- 0
acc-all-p: 0.7323947368421052 +- 0.06101826675772052
g2
acc-p0: 0.8057894736842105 +- 0
acc-p1: 0.7156578947368422 +- 0
acc-p2: 0.7773684210526316 +- 0
acc-p3: 0.8132894736842106 +- 0
acc-p4: 0.7723684210526316 +- 0
acc-all-p: 0.7768947368421053 +- 0.03850371874690442
final
acc-p0: 0.8228947368421052 +- 0
acc-all-p: 0.8228947368421052 +- 0
According to the figure 4 in your paper, I think maybe I should use 4 or 5 iterations.
Hi, I am training ipet with zero training examples, I run the following command.
python3 cli.py --method ipet --pattern_ids 0 1 2 3 4 --data_dir /share/home/zqzeng/wmni/data/ag_news_csv/ag_news_csv --model_type roberta --model_name_or_path /share/home/zqzeng/transformers/roberta-large --task_name agnews --output_dir /share/home/zqzeng/wmni/data/output/unsupervised-ipet --do_train --do_eval --pet_repetitions 1 --ipet_n_most_likely 100 --reduction mean --train_examples 0
And I got the following result:2021-11-09 20:22:31,904 - INFO - tasks - Creating features from dataset file at ag_news_csv/ (num_examples=0, set_type=train)
2021-11-09 20:22:34,978 - INFO - tasks - Returning 120000 train examples with label dist.: [('3', 30000), ('4', 30000), ('2', 30000), ('1', 30000)]
I followed the flow of the program and found that the whole train examples(120000) was uesd to train each individual model. When I used "--train_examples 10", it's normal, as shown below:2021-11-09 20:19:13,402 - INFO - tasks - Creating features from dataset file at ag_news_csv/ (num_examples=10, set_type=train)
2021-11-09 20:19:16,127 - INFO - tasks - Returning 10 train examples with label dist.: [('1', 3), ('4', 4), ('2', 2), ('3', 1)]
Does the zero training examples don't work? I would be grateful for your prompt reply.