Open fxmarty opened 2 years ago
@fxmarty asking as I can´t really get glue as good ad in the paper, if you have run also other glue tasks, did you have to apply similar changes also in remaining tasks?
And also, do I understand correctly that you had to reduce bath size from 4 * 8 = 32 to 8, considering gpus?
I am facing the similar problem: when I set num_gpu=2 and add gradient_accumulation_steps=4 (which makes the batch size still 32), the average of 5 random seeds on CoLA of roberta-large using LoRA is 67.0. This results are "the result for each run is taken from the best epoch".
Did anyone know the solution? I am assuming the setting per_device_train_batch_size = 4 on a single GPU is equivalent to total batch size = 4 which is the paper setting, but I am still getting matthews_correlation = 0 during evaluation.
Did anyone know the solution? I am assuming the setting per_device_train_batch_size = 4 on a single GPU is equivalent to total batch size = 4 which is the paper setting, but I am still getting matthews_correlation = 0 during evaluation.
Hello, I also encountered the same problem. Did you finally solve it?
My steps:
Change
export num_gpus=8
toexport num_gpus=1
inroberta_large_cola.sh
Then
CUDA_VISIBLE_DEVICES=0 bash roberta_large_cola.sh
Running on a single A100
Using:
During training, the
eval_matthews_correlation
is stuck to 0 at all epochs. I actually had the same issue on the current transformers version, and decreasing the learning rate + no warmup helped to regain OKeyish numbers during training, but not as shiny as 0.68.Do you have an idea of what I could be doing wrong?
Update: using
trains just fine, I have no eval_matthews_correlation = 0 during training.