Hi, @intfloat
I used batch-size 64 on a piece of 3090 and evaluated the results with bert-base-uncased and bert-large-uncased respectively. The results show that bert-base-uncased gives better results. During the training process, I vaguely observed that using bert-large-uncased seems to have a slower decrease in loss. I am wondering if running more epochs, the performance of bert-large-uncased will keep up with bert-base-uncased or even surpass it.
Looking forward to your reply.
Hi, @intfloat I used batch-size 64 on a piece of 3090 and evaluated the results with bert-base-uncased and bert-large-uncased respectively. The results show that bert-base-uncased gives better results. During the training process, I vaguely observed that using bert-large-uncased seems to have a slower decrease in loss. I am wondering if running more epochs, the performance of bert-large-uncased will keep up with bert-base-uncased or even surpass it. Looking forward to your reply.