training scripts or parameters

41924076 commented 8 months ago

@izhx Could you kindly share the training scripts for models of varying scales, or alternatively, the complete set of training parameters used for models at different scales? I saw the following parameters in your paper (AdamW learning rate=4e-4, warmup period=0.25 epochs, batch size=1024, 8 negative examples, 8 A100-80GB GPUs, TF32). And I'd like to know the values of other parameters (whether all scales of the bloom model use epoch=3, chunksize=64, debias_batch, freezenonbias, max_seq_length=default 300 as written in train.py, and if there are any additional parameters included).

41924076 commented 8 months ago

I attempted to replicate the training process of Bloom560m using the following parameters, but my training results weren't satisfactory. I suspect there might be an issue with the training parameters I've used.

param： 8 A800 GPU "batch_size": 1024, "negs_per_ins": 8, "max_seq_length": 300, "model_name":"(mydir)/bloom-560m", "seed": 0, "steps_per_epoch": null, "epochs": 3, "setting": "spec", "pooling": "lasttoken", "default_type": "query", "dataset": "allnli,msmarco", "debias_batch": true, "warmup_epoch":0.25, "lr": 0.0004, "model_save_path":"(mydir)", "use_amp"; false, "wandb": false, "wandbwatchlog": "all", "local_rank": -1, "freeze": false, "freezenonbias": true, "unfreezewte": false, "chunksize":64, "tf32": true, "devset":"all"

In each epoch, the log shows: dataloader length: 762, CKPT save steps: 96, warm up step: 24.

I used args.devset == 'all', the evaluation results during training are: 1 epoch: msmarco_dev_small ndcg@10=0.26402, mrr@10=0.21343, stsbenchmark cos_sim_spearman_score=0.70745 2 epoch: msmarco_dev_small ndcg@10=0.28632, mrr@10=0.23291, stsbenchmark cos_sim_spearman_score=0.72313 3 epoch: msmarco_dev_small ndcg@10=0.29121, mrr@10=0.23728, stsbenchmark cos_sim_spearman_score=0.72577

izhx commented 8 months ago

Thank you for the interest in our work

All training logs with args (on A100 GPU):

41924076 commented 8 months ago

Thank you so much for sharing your log file!

41924076 commented 8 months ago

@izhx 您好，我按照您560m的参数设置，使用4 A800 GPU训练，第一轮的loss和eval结果和您的log不太一样，想请教一下您觉得可能的原因会是什么？我的1 epoch: msmarco_dev_small ndcg@10=0.32329, mrr@10=0.27043, stsbenchmark cos_sim_spearman_score=0.74720 您的1 epoch: msmarco_dev_small ndcg@10=0.34639, mrr@10=0.28911, stsbenchmark cos_sim_spearman_score=0.82822

41924076 commented 8 months ago

有可能是因为accelerate config或者哪个库的版本不一样吗？可否请问一下，您的accelerate config和accelerate launch输入命令是什么？

izhx commented 8 months ago

您好，数据和库版本两方面都有可能。

accelerate 我没有做特殊配置，启动命令如下：

    accelerate launch train.py \
        --model_name bigscience/bloom-560m \
        --lr 4e-4 --epochs 5 \
        --setting spec --pooling lasttoken \
        --default_type query \
        --negs_per_ins=8 \
        --dataset msmarco,allnli \
        --batch_size 256 \  # 这里用的4卡
        --chunksize 128 \
        --max_seq_length=300 --warmup_epoch 0.25 \
        --freezenonbias --tf32 --debias_batch \
        --model_save_path XXXX

我的主要库版本为：

accelerate               0.21.0
mteb                     1.0.2
sentence-transformers    2.2.2  # 修正lasttoken那个 PR 的版本 https://github.com/UKPLab/sentence-transformers/pull/2111
torch                    2.0.0
transformers             4.29.2

41924076 commented 8 months ago

非常感谢您的回复，看起来和我的设置差不多，这个性能差距的问题太玄学了。

izhx / uni-rep

training scripts or parameters #2