Closed BaixuanLi closed 9 months ago
As the correlation result of this model is around 0.38 in the paper.
Hi!
Are you using the training scripts provided in this repository? Do you mind printing exactly the command you're using to run/evaluate?
Sure~
I'm using the .sh file, and set the parameter according to the paper, and the screenshot shows all the hyperparameters that I've chose.
Hey Baixuan, thank you for your patience.
I reran with the following parameters (I believe the same you submitted):
model=${MODEL:-princeton-nlp/sup-simcse-roberta-base} # pre-trained model
encoding=${ENCODER_TYPE:-cross_encoder} # cross_encoder, bi_encoder, tri_encoder
lr=${LR:-3e-5} # learning rate
wd=${WD:-0.1} # weight decay
transform=${TRANSFORM:-True} # whether to use an additional linear layer after the encoder
objective=${OBJECTIVE:-mse} # mse, triplet, triplet_mse
triencoder_head=${TRIENCODER_HEAD:-None} # hadamard, concat (set for tri_encoder)
seed=${SEED:-666}
output_dir=${OUTPUT_DIR:-output}
config=enc_${encoding}__lr_${lr}__wd_${wd}__trans_${transform}__obj_${objective}__tri_${triencoder_head}__s_${seed}
train_file=${TRAIN_FILE:-data/csts_train.csv}
eval_file=${EVAL_FILE:-data/csts_validation.csv}
test_file=${TEST_FILE:-data/csts_test.csv}
I ran 3 seeds with an NVIDIA A6000 and got the following results for TEST. Seed 42: MSE: 9.172729371293485, spearmanr: 0.3829023341280033, pearsonr: 0.38653424827388805
Seed 101: MSE: 9.109799315896288, spearmanr: 0.39012224501669596, pearsonr: 0.3933606861384604
Seed 666: MSE: 9.248459630892224, spearmanr: 0.36515925620807504, pearsonr: 0.367544032549672
Mean: MSE: 9.176996106027334, spearmanr: 0.3793946117842581, pearsonr: 0.3824796556540068
I also tried with an NVIDIA RTX 2080 and got the following: Seed 42: MSE: 9.14469821586109, spearmanr: 0.3940268436387526, pearsonr: 0.39658619051163446
Seed 101: MSE: 9.145529172505276, spearmanr: 0.383653880273925, pearsonr: 0.387261727044946
Seed 666: MSE: 9.223952018486575, spearmanr: 0.39191138026960576, pearsonr: 0.3953807255869861
Mean: MSE: 9.171393135617647, spearmanr: 0.3898640347274278, pearsonr: 0.39307621438118884
So these all seem pretty similar to the reported number, but there seems to be a bit of hardware dependent variance. Are you running on very different GPU?
Btw, note that there are some more parameters in run_sts.sh
that could impact performance. For instance, I ran with the following (included by default in run_sts.sh
):
--pooler_type cls \
--freeze_encoder False \
--max_seq_length 512 \
--condition_only False \
--sentences_only False \
--do_train \
--do_eval \
--do_predict \
--evaluation_strategy epoch \
--per_device_train_batch_size 8 \
--gradient_accumulation_steps 4 \
--max_grad_norm 0.0 \
--num_train_epochs 3 \
--lr_scheduler_type linear \
--warmup_ratio 0.1 \
--save_strategy epoch \
--save_total_limit 1 \
--fp16 True
So if you've changed any of those parameters, it could also impact performance.
Sorry to bother, but I could only get about 0.30 spearman's and Pearson's correlation, while I've set the hyperparameters according to the paper (simcse-base, cross-encoding, lr=3e-5, wd=0.1, epoch=3, transform=True), and as you can see, the result is not as predicted, so could you help me out about this? I'll be really thankful for that!!