timoschick / pet

This repository contains the code for "Exploiting Cloze Questions for Few-Shot Text Classification and Natural Language Inference"
https://arxiv.org/abs/2001.07676
Apache License 2.0
1.62k stars 283 forks source link

Training Time Issue #83

Open imethanlee opened 2 years ago

imethanlee commented 2 years ago

Hi,

What is the expected time to train PET model on yelp_full dataset (with default arguments)? I started the training the day before yesterday with a RTX 3090 GPU and it is still running.

Thanks.

timoschick commented 2 years ago

I don't know how efficient RTX 3090's are, but with a single Nvidia Geforce 1080Ti, training PET (not iPET) with the default parameters is a matter of a few hours. In case you haven't fixed the issue yourself yet, could you provide me with the exact command that you've used to train the model? Also, did you check (e.g., with nvidia-smi) whether the GPU is actually used?

jmcrey commented 2 years ago

Hi @timoschick,

I am having the same issue here. I started the training on a RTX 3090 yesterday and it is still running. The command I am using is as follows:

python pet/cli.py \
    --method pet \
    --pattern_ids 0 3 5 \
    --data_dir ${DATA_DIR} \
    --model_type albert \
    --model_name_or_path albert-xxlarge-v2 \
    --task_name boolq \
    --output_dir ${OUTPUT_DIR} \
    --do_train \
    --do_eval \
    --pet_per_gpu_eval_batch_size 8 \
    --pet_per_gpu_train_batch_size 2 \
    --pet_gradient_accumulation_steps 8 \
    --pet_max_steps 250 \
    --pet_max_seq_length 256 \
    --pet_repetitions 3 \
    --sc_per_gpu_train_batch_size 2 \
    --sc_per_gpu_unlabeled_batch_size 2 \
    --sc_gradient_accumulation_steps 8 \
    --sc_max_steps 5000 \
    --sc_max_seq_length 256 \
    --sc_repetitions 1
jmcrey commented 2 years ago

Just a heads up -- I bumped up the version of PyTorch to 1.8.0 and CUDA to 11.3 and that solved the performance issues. I am now able to run through the first 126 epochs in about 12 minutes compared to 1.5 hours. I am still waiting to see if this affects the results, but the performance is much better.

jacksonchen1998 commented 1 year ago

@jmcrey So, the result is ok ?

I'm now use 1080 Ti and trained with CUDA 11.5 and having TensorRT with 3 epoch. My pre-trained model is Roberta-large and the dataset is AG News, other's arguments set to default. It's looks like the training time needs to take half a day.