Closed dahlian00 closed 5 months ago
I noticed it's written as parser.add_argument('--batch-size', type=int, default=2, help='Batch Size for the model'), but looking at the code, it seems to specify the batch size per GPU.
Sorry for asking before taking a closer look at the code.
Thank you for your excellent research!
I tried your fine-tuning code using A-100 4GPU. However, the maximum batch size I can use is 8, as using a larger batch size results in a CUDA memory error.
According to the paper, you can increase the batch size to 24 with V100 4GPU.
I didn’t change your code, but I just want to confirm the batch size you used for fine-tuning.