Open Atlantic8 opened 1 year ago
I only have training data with format: sentence1, sentence2, label so I cannot construct training data with format: query=xxx, pos=[], neg=[]
Also, when I trying to train using train.py, with "--fp16 True --gradient_accumulation_steps 3", I got out of GPU memory. I was using A100 40G. why training this model takes this much GPU memory. could you tell me the GPU hardware you used to train this model?
btw, this model can be trained only when per_device_train_batch_size is set to 2
could you tell me the GPU hardware you used to train this model?
@Atlantic8 , this is an excerpt from the paper:
We use the maximum batch size that fits the machine memory and run all our experiments on 40GB A100 GPUs.
btw, this model can be trained only when per_device_train_batch_size is set to 2
What's your source for this? @Atlantic8
Hi, Thanks a lot for your interest in the INSTRUCTOR!
Hope this helps!
Hi, Thanks a lot for your interest in the INSTRUCTOR!
- As the INSTRUCTOR model follows the same architecture as GTR models, the same training script should be applicable.
- If you have only paired sentences (I assume that they are positive pairs, e.g., question and answer), then using random negatives is probably the easiest way to construct the training data.
- For the xl model, the maximum length, gradient accumulation steps and batch size should depend on your machines.
Hope this helps!
So for custom data, do we need to randomly construct a data format like query=xxx, pos=[], neg=[] before running?
can we fine-tune using train.py based on the released model hkunlp/instructor-xl? If yes, could you please show me the shell script for training? thanks