izhx / uni-rep

Code for embedding and retrieval research.
MIT License
15 stars 0 forks source link

training scripts or parameters #2

Open 41924076 opened 8 months ago

41924076 commented 8 months ago

@izhx Could you kindly share the training scripts for models of varying scales, or alternatively, the complete set of training parameters used for models at different scales? I saw the following parameters in your paper (AdamW learning rate=4e-4, warmup period=0.25 epochs, batch size=1024, 8 negative examples, 8 A100-80GB GPUs, TF32). And I'd like to know the values of other parameters (whether all scales of the bloom model use epoch=3, chunksize=64, debias_batch, freezenonbias, max_seq_length=default 300 as written in train.py, and if there are any additional parameters included).

41924076 commented 8 months ago

I attempted to replicate the training process of Bloom560m using the following parameters, but my training results weren't satisfactory. I suspect there might be an issue with the training parameters I've used.

param: 8 A800 GPU "batch_size": 1024, "negs_per_ins": 8, "max_seq_length": 300, "model_name":"(mydir)/bloom-560m", "seed": 0, "steps_per_epoch": null, "epochs": 3, "setting": "spec", "pooling": "lasttoken", "default_type": "query", "dataset": "allnli,msmarco", "debias_batch": true, "warmup_epoch":0.25, "lr": 0.0004, "model_save_path":"(mydir)", "use_amp"; false, "wandb": false, "wandbwatchlog": "all", "local_rank": -1, "freeze": false, "freezenonbias": true, "unfreezewte": false, "chunksize":64, "tf32": true, "devset":"all"

In each epoch, the log shows: dataloader length: 762, CKPT save steps: 96, warm up step: 24.

I used args.devset == 'all', the evaluation results during training are: 1 epoch: msmarco_dev_small ndcg@10=0.26402, mrr@10=0.21343, stsbenchmark cos_sim_spearman_score=0.70745 2 epoch: msmarco_dev_small ndcg@10=0.28632, mrr@10=0.23291, stsbenchmark cos_sim_spearman_score=0.72313 3 epoch: msmarco_dev_small ndcg@10=0.29121, mrr@10=0.23728, stsbenchmark cos_sim_spearman_score=0.72577

izhx commented 8 months ago

Thank you for the interest in our work

All training logs with args (on A100 GPU):

41924076 commented 8 months ago

Thank you so much for sharing your log file!

41924076 commented 8 months ago

@izhx 您好,我按照您560m的参数设置,使用4 A800 GPU训练,第一轮的loss和eval结果和您的log不太一样,想请教一下您觉得可能的原因会是什么? 我的1 epoch: msmarco_dev_small ndcg@10=0.32329, mrr@10=0.27043, stsbenchmark cos_sim_spearman_score=0.74720 您的1 epoch: msmarco_dev_small ndcg@10=0.34639, mrr@10=0.28911, stsbenchmark cos_sim_spearman_score=0.82822

41924076 commented 8 months ago

有可能是因为accelerate config或者哪个库的版本不一样吗? 可否请问一下,您的accelerate config和accelerate launch输入命令是什么?

izhx commented 8 months ago

您好,数据和库版本两方面都有可能。

accelerate 我没有做特殊配置,启动命令如下:

    accelerate launch train.py \
        --model_name bigscience/bloom-560m \
        --lr 4e-4 --epochs 5 \
        --setting spec --pooling lasttoken \
        --default_type query \
        --negs_per_ins=8 \
        --dataset msmarco,allnli \
        --batch_size 256 \  # 这里用的4卡
        --chunksize 128 \
        --max_seq_length=300 --warmup_epoch 0.25 \
        --freezenonbias --tf32 --debias_batch \
        --model_save_path XXXX

我的主要库版本为:

accelerate               0.21.0
mteb                     1.0.2
sentence-transformers    2.2.2  # 修正lasttoken那个 PR 的版本 https://github.com/UKPLab/sentence-transformers/pull/2111
torch                    2.0.0
transformers             4.29.2
41924076 commented 8 months ago

非常感谢您的回复,看起来和我的设置差不多,这个性能差距的问题太玄学了。