Open aamir-s18 opened 1 year ago
Hi, Thanks a lot for your interest in the INSTRUCTOR model!
As the MEDI dataset contains a large volume of data, there is no need to complete training on all of them. In fact, as some sources in MEDI may contain similar data, there may be overfitting problem if the training goes up to 100k steps.
For your reference, we use the following command in the training:
python train.py --model_name_or_path sentence-transformers/gtr-t5-large --output_dir {output_directory} --cache_dir {cache_directory} --max_source_length 512 --num_train_epochs 10 --save_steps 500 --cl_temperature 0.01 --warmup_ratio 0.1 --learning_rate 2e-5 --overwrite_output_dir
Feel free to add any further questions or comments!
Hey,
But for your published model, what data exactly did you train it?
Also, loss and batch size are missing (in your report). If you say 40k steps, for example, the size of the samples differs a lot based on the batch size. It would be great if you could report the exact training setup to replicate and verify your work.
Thanks!
Hi, we train the model on the MEDI data, which you can download from https://drive.google.com/file/d/1vZ5c2oJNonGOvXzppNg5mHz24O6jcc52/view?usp=sharing. In our setting, we only use the batch size 4
Hey,
could you please report the loss as well. So it means that you only train it on 4 * 40k data samples of the MEDI dataset and for 1 epoch?
Hi,
the batch size of 4 is very small for contrastive learning, maybe it should be larger, such as 32 or 64?
Yes, the model is probably better with a larger training batch size. However, due to the limit of the machine, we may leave the further scaling to future work!
Hi, we train the model on the MEDI data, which you can download from https://drive.google.com/file/d/1vZ5c2oJNonGOvXzppNg5mHz24O6jcc52/view?usp=sharing. In our setting, we only use the batch size 4
Hey, i had a small question.
Where can we change the batch_size ? I can't find any argument for it.
Thanks
Hi, you may change the batch size via the argument per_device_train_batch_size
.
Got it, Thank you for the help.
Hi, I am also trying to replicate your work. May I know how many GPUs do you use in training?
Hi, we use only a single GPU in the training.
Hey, we are currently trying to replicate the Instructor model. Issue #14 already asks this, but please report the exact training setup for the models.
Also, I am interested in the loss of your model. I didn't get your reported results by running the model for 100k steps. It could be more evident to me how you used just 40k steps for the model while you mentioned in your paper that you trained it on the MEDI dataset.
I would appreciate your help here :)
Hey! I also encountered issues with reproducing the results. Have you successfully replicated the INSTRUCTOR's performance? Even though I used the exact same settings, I couldn't achieve success. If you have succeeded, could you please give me some advice? Thank you very much.
@EliverQ could you hit me up through email aamir.shakir [at] epfl.ch
Hi, I have the same issue and cannot replica the results reported on the paper. Could the authors provide the exact training commands of the checkpoints released?
Hey, we are currently trying to replicate the Instructor model. Issue #14 already asks this, but please report the exact training setup for the models.
Also, I am interested in the loss of your model. I didn't get your reported results by running the model for 100k steps. It could be more evident to me how you used just 40k steps for the model while you mentioned in your paper that you trained it on the MEDI dataset.
I would appreciate your help here :)