Details about the Training Configurations

Lancelot39 commented 1 year ago

Thanks for your very exciting work and nice publicly released code and data. It is a very very great work in sentence representation learning!

I am very willing to follow this work and re-implement the training process. But I am not very clear about the per-device-batch-size and total-batch-size when tuning the large and XL models. And I am also a bit confused about the actual training steps of all the models. Could you please answer my above questions? I would like to estimate how many V100 32G GPUs are required based on them :)

Harry-hash commented 1 year ago

Hi, Thanks a lot for your interest in the INSTRUCTOR model!

per-device-batch-size refers to the batch size in a single device, i.e., a single GPU, while total-batch-size refers to the batch size on all the devices, e.g., 8 GPUs. In our experiments, we train the model for 20K steps, but you may stop it earlier or train the model for longer time depending on the setting you want to optimize.

Feel free to add any further questions or comments!

hongjin-su commented 1 year ago

Feel free to re-open the issue if you have any further questions or comments!

xlang-ai / instructor-embedding

Details about the Training Configurations #32