wangbo9719 / StAR_KGC

84 stars 10 forks source link

Training time #17

Closed meaningful96 closed 5 months ago

meaningful96 commented 1 year ago

hello. I have a question about learning the Star model. now

 CUDA_VISIBLE_DEVICES=0 \
  python get_new_dev_dict.py \
 --model_class bert \
 --weight_decay 0.01 \
 --learning_rate 5e-5 \
 --adam_epsilon 1e-6 \
 --max_grad_norm 0.\
 --warmup_proportion 0.05 \
 --do_train\
 --num_train_epochs 7 \
 --dataset WN18RR\
 --max_seq_length 128 \
 --gradient_accumulation_steps 4 \
 --train_batch_size 16 \
 --eval_batch_size 128 \
 --logging_steps 100 \
 --eval_steps -1 \
 --save_steps 2000 \
 --model_name_or_path bert-base-uncased \
 --do_lower_case \
 --output_dir ./result/WN18RR_get_dev \
 --num_worker 12 \
 --seed 42 \

This process is in progress, and the dataset is using FB15k-237 and WN18RR using different GPUs. However, for both data sets, this process takes too long. is this normal? In the case of FB15k-237, iteration took 50 hours to do train once. In the case of WN18RR, it takes about 15 hours per Iteration. is this normal?

wangbo9719 commented 1 year ago

Sorry for the late response.

If your GPU computing power is insufficient, it is normal to spend this time training due to the full tuning of RoBERTa-large. The FB15k-237 dataset is much bigger than the WN18RR, and thus it will take longer.