mourga / contrastive-active-learning

Code for the EMNLP 2021 Paper "Active Learning by Acquiring Contrastive Examples" & the ACL 2022 Paper "On the Importance of Effectively Adapting Pretrained Language Models for Active Learning"
GNU General Public License v3.0
118 stars 11 forks source link

Issues related to the running #1

Closed szhang42 closed 2 years ago

szhang42 commented 3 years ago

Hello,

Thanks for this great repo! I first follow the readme to set up the environments including downloading the data. I have two questions while I am implementing this:

  1. when I run this command: python run_al.py --dataset_name sst-2 --acquisition random, it pops the error like the below. If possible, may you advise on this?
  1. When I run this command python run_al.py --dataset_name imdb --acquisition cal, it pops up the error like the below.

If possible, could you please advise on these two? Thanks very much!

mourga commented 3 years ago

Hi @szhang42, thanks for reaching out!

First, I would say if you can check that you have created the virtual environment correctly and that all packages are in the same version as in requirements.txt. This is important, especially for the torch and transformers packages because without the correct version, this code will not run properly.

I just debugged it and for me it's working. My log for python run_al.py --dataset_name imdb --acquisition cal is the following:

torch: 1.9.0
cuda: 11.1
Cuda available: True
device: cuda:0
output_dir=/home/acp19am/contrastive-active-learning/checkpoints/imdb_bert_cal_1328/imdb_bert-cls
Created /home/acp19am/contrastive-active-learning/checkpoints/imdb_bert_cal_1328/imdb_bert-cls
10/13/2021 12:11:51 - WARNING - __main__ -   Process rank: -1, device: cuda:0, n_gpu: 1, distributed training: False, 16-bits training: False
10/13/2021 12:11:52 - INFO - utilities.data_loader -   Creating dataset from dataset file at /home/acp19am/contrastive-active-learning/data/IMDB
10/13/2021 12:11:53 - INFO - utilities.preprocessors -   Writing example 0/22500
10/13/2021 12:11:59 - INFO - utilities.preprocessors -   Writing example 10000/22500
10/13/2021 12:12:04 - INFO - utilities.preprocessors -   Writing example 20000/22500
10/13/2021 12:12:06 - INFO - utilities.data_loader -   Saving dataset into cached file /home/acp19am/contrastive-active-learning/data/IMDB/cached_train_imdb_original
10/13/2021 12:12:50 - INFO - utilities.data_loader -   Creating dataset from dataset file at /home/acp19am/contrastive-active-learning/data/IMDB
10/13/2021 12:12:50 - INFO - utilities.preprocessors -   Writing example 0/2500
10/13/2021 12:12:51 - INFO - utilities.data_loader -   Saving dataset into cached file /home/acp19am/contrastive-active-learning/data/IMDB/cached_dev_imdb_original
10/13/2021 12:12:57 - INFO - utilities.data_loader -   Creating dataset from dataset file at /home/acp19am/contrastive-active-learning/data/IMDB
10/13/2021 12:12:58 - INFO - utilities.preprocessors -   Writing example 0/25000
10/13/2021 12:13:04 - INFO - utilities.preprocessors -   Writing example 10000/25000
10/13/2021 12:13:09 - INFO - utilities.preprocessors -   Writing example 20000/25000
10/13/2021 12:13:12 - INFO - utilities.data_loader -   Saving dataset into cached file /home/acp19am/contrastive-active-learning/data/IMDB/cached_test_imdb_original
10/13/2021 12:14:01 - INFO - utilities.data_loader -   Creating dataset from dataset file at /home/acp19am/contrastive-active-learning/data/SST-2
10/13/2021 12:14:01 - INFO - utilities.preprocessors -   Writing example 0/871
10/13/2021 12:14:01 - INFO - utilities.data_loader -   Saving dataset into cached file /home/acp19am/contrastive-active-learning/data/SST-2/cached_test_sst-2_original

train set stats: class 0: 49% class 1: 51% 
validation set stats: class 0: 50% class 1: 50% 
test set stats: class 0: 50% class 1: 50% 

Dataset for annotation: imdb
Acquisition function: cal
Budget: 425% of labeled data

Created /home/acp19am/contrastive-active-learning/experiments/al_imdb_bert_cal
Created /home/acp19am/contrastive-active-learning/experiments/al_imdb_bert_cal/1328_cls
init % class 1: 52.0
init % class 0: 48.0

 Start Training model of iteration 1!

I will commit the IMDB and SST-2 datasets as well to make sure you have downloaded the correct files.

I hope this helps!

szhang42 commented 3 years ago

Hello @mourga ,

Thanks very much for your response! The above issues seem all solved after I changing from A4000 to RTX machines. I have successfully run on SST-2, IMDB, and etc.

And the only running issue is with DBPEDIA. For DBPEDIA, I follow the instructions to download the dbpedia_csv.tar.gz which include the train.csv and test.csv. However, when I run the command like this " python run_al.py --dataset_name dbpedia --acquisition cal", there is an error as the below. May you please advise on this? Is this due to the dataset or do I need any extra settings from DBPEDIA. Thanks!