timoschick / pet

This repository contains the code for "Exploiting Cloze Questions for Few-Shot Text Classification and Natural Language Inference"
https://arxiv.org/abs/2001.07676
Apache License 2.0
1.62k stars 283 forks source link

Cannot take a larger sample than population when 'replace=False' #80

Closed xruifan closed 2 years ago

xruifan commented 2 years ago

Hi,

I am training iPET with the following command:

python3 /home/acb19lh/pet-master/cli.py \
--method ipet --task_name idiom-detection \
--pattern_ids 0 1 2 3 \
--data_dir /home/acb19lh/pet-master/magpie-corpus-master \
--model_type bert \
--model_name_or_path bert-base-uncased \
--output_dir /data/acb19lh/results/ipet-bert/ipet-bert-30-70 \
--do_train \
--do_eval \
--train_examples 5784 \
--unlabeled_examples 13496 \
--split_examples_evenly \
--pet_per_gpu_train_batch_size 4 \
--pet_per_gpu_unlabeled_batch_size 8 \
--pet_gradient_accumulation_steps 2 \
--pet_max_steps 250 \
--lm_training \
--sc_per_gpu_train_batch_size 8 \
--sc_per_gpu_unlabeled_batch_size 8 \
--sc_gradient_accumulation_steps 2 \
--sc_max_steps 5000 \
--ipet_generations 5 \
--ipet_n_most_likely 100 \

But after the first generation, I got:

Traceback (most recent call last):
  File "/home/acb19lh/pet-master/cli.py", line 282, in <module>
    main()
  File "/home/acb19lh/pet-master/cli.py", line 266, in main
    pet.train_ipet(pet_model_cfg, pet_train_cfg, pet_eval_cfg, ipet_cfg, sc_model_cfg, sc_train_cfg, sc_eval_cfg,
  File "/home/acb19lh/pet-master/pet/modeling.py", line 191, in train_ipet
    generate_ipet_train_sets(train_data=train_data, unlabeled_data=unlabeled_data,
  File "/home/acb19lh/pet-master/pet/modeling.py", line 679, in generate_ipet_train_sets
    subdir_train_set = generate_ipet_train_set(
  File "/home/acb19lh/pet-master/pet/modeling.py", line 753, in generate_ipet_train_set
    label_examples = _draw_examples_by_label_probability(
  File "/home/acb19lh/pet-master/pet/modeling.py", line 764, in _draw_examples_by_label_probability
    return rng.choice(examples, size=num_examples, replace=False, p=label_probabilities).tolist()
  File "mtrand.pyx", line 965, in numpy.random.mtrand.RandomState.choice
ValueError: Cannot take a larger sample than population when 'replace=False'

My question is that is n_most_likely flag could only be used in zero training example cases?

Thanks.

xruifan commented 2 years ago

It is inexplicable to select n most likely examples when there are labelled data. So I am closing this issue.