What causes the training of the model to be ineffective?

kongds / scaling_sentemb

Scaling Sentence Embeddings with Large Language Models

100 stars 4 forks source link

What causes the training of the model to be ineffective? #6

Closed YuboFeng2023 closed 1 year ago

YuboFeng2023 commented 1 year ago

Hi.

What a fabulous sentence embedding model you have created! I am trying to reproduce your "3.3 Contrastive learning with efficient fine-tuning" by using your ft_llm.py script. And I keep the same parameters with your setting, OPT-1.3b, epoch=1, learning_rate=5e-4, etc...... But when I evaluate these generated checkpoints, THE PERFORMANCE IS UNCHANGED! By other words, the training is invalid, because the generated checkpoints are invariant.

Could you please offer some advice for this problem? Thank you very much!

kongds commented 1 year ago

Can you provide training script and evaluation script?

kongds commented 1 year ago

If you train opt-1.3b with CUDA_VISIBLE_DEVICES=0,1 bash train_llm.sh opt-1.3b, you can find a directory opt-1.3b-lora, which contains checkpoints.

After training, you can run bash eval_checkpoints.sh opt-1.3b-lora, which will evaluate each checkpoint in STS-B valid set, and then test best checkpoint in STS benchmark.