princeton-nlp / SimCSE

[EMNLP 2021] SimCSE: Simple Contrastive Learning of Sentence Embeddings https://arxiv.org/abs/2104.08821
MIT License
3.35k stars 507 forks source link

Cannot reproduce the result~ #53

Closed liuh236 closed 3 years ago

liuh236 commented 3 years ago

Hello, and thank you for this useful code! I tried to reproduce the unsupervisd BERT+SimCSE results, but failed. My environment setup is as follows:

pytorch=1.7.1 cudatoolkit=11.1 Single RTX 3090 The following script is the training script I used (exactly the same as run_unsup_example.sh).

python train.py \ --model_name_or_path bert-base-uncased \ --train_file data/wiki1m_for_simcse.txt \ --output_dir result/my-unsup-simcse-bert-base-uncased \ --num_train_epochs 1 \ --per_device_train_batch_size 64 \ --learning_rate 3e-5 \ --max_seq_length 32 \ --evaluation_strategy steps \ --metric_for_best_model stsb_spearman \ --load_best_model_at_end \ --eval_steps 125 \ --pooler_type cls \ --mlp_only_train \ --overwrite_output_dir \ --temp 0.05 \ --do_train \ --do_eval \ --fp16 \ "$@" However, there is a runtimeerror when training is finished. I obtained following evaluation results:

+-------+-------+-------+-------+-------+--------------+-----------------+-------+ | STS12 | STS13 | STS14 | STS15 | STS16 | STSBenchmark | SICKRelatedness | Avg. | +-------+-------+-------+-------+-------+--------------+-----------------+-------+ | 64.28 | 79.15 | 70.99 | 78.38 | 78.26 | 75.62 | 67.58 | 73.47 | +-------+-------+-------+-------+-------+--------------+-----------------+-------+

I think the gap (2.8 in average) is too large. Is it because of the error? How to obtain ~76 results in STS tasks?

liuh236 commented 3 years ago

it seems that SentEval currently includes 17 downstream tasks but the code only use 14 tasks? but how to obtain ~76 results?

gaotianyu1350 commented 3 years ago

Hi,

Can you paste the error you mentioned here? I think it might be related to the error.

SentEval contains different types of tasks (probing, STS, and transfer tasks). In our paper we take the STS tasks for the main experiment, since it directly evaluates the quality of sentence embeddings.

liuh236 commented 3 years ago

when i use the training script(exactly the same as run_unsup_example.sh), the error is as follows:

Traceback (most recent call last): File "train.py", line 584, in main() File "train.py", line 548, in main train_result = trainer.train(model_path=model_path) File "/cluster/home/qnan/SimCSE/simcse/trainers.py", line 464, in train tr_loss += self.training_step(model, inputs) File "/cluster/home/qnan/anaconda3/envs/ganspace/lib/python3.7/site-packages/transformers/trainer.py", line 1248, in training_step loss = self.compute_loss(model, inputs) File "/cluster/home/qnan/anaconda3/envs/ganspace/lib/python3.7/site-packages/transformers/trainer.py", line 1277, in compute_loss outputs = model(*inputs) File "/cluster/home/qnan/anaconda3/envs/ganspace/lib/python3.7/site-packages/torch/nn/modules/module.py", line 727, in _call_impl result = self.forward(input, *kwargs) File "/cluster/home/qnan/anaconda3/envs/ganspace/lib/python3.7/site-packages/torch/nn/parallel/data_parallel.py", line 162, in forward return self.gather(outputs, self.output_device) File "/cluster/home/qnan/anaconda3/envs/ganspace/lib/python3.7/site-packages/torch/nn/parallel/data_parallel.py", line 174, in gather return gather(outputs, output_device, dim=self.dim) File "/cluster/home/qnan/anaconda3/envs/ganspace/lib/python3.7/site-packages/torch/nn/parallel/scatter_gather.py", line 68, in gather res = gather_map(outputs) File "/cluster/home/qnan/anaconda3/envs/ganspace/lib/python3.7/site-packages/torch/nn/parallel/scatter_gather.py", line 62, in gather_map for k in out)) File "", line 7, in init File "/cluster/home/qnan/anaconda3/envs/ganspace/lib/python3.7/site-packages/transformers/file_utils.py", line 1383, in __post_init__ for element in iterator: File "/cluster/home/qnan/anaconda3/envs/ganspace/lib/python3.7/site-packages/torch/nn/parallel/scatter_gather.py", line 62, in for k in out)) File "/cluster/home/qnan/anaconda3/envs/ganspace/lib/python3.7/site-packages/torch/nn/parallel/scatter_gather.py", line 55, in gather_map return Gather.apply(target_device, dim, outputs) File "/cluster/home/qnan/anaconda3/envs/ganspace/lib/python3.7/site-packages/torch/nn/parallel/_functions.py", line 71, in forward return comm.gather(inputs, ctx.dim, ctx.target_device) File "/cluster/home/qnan/anaconda3/envs/ganspace/lib/python3.7/site-packages/torch/nn/parallel/comm.py", line 230, in gather return torch._C._gather(tensors, dim, destination) RuntimeError: Input tensor at index 3 has invalid shape [14, 14], but expected [14, 17]

gaotianyu1350 commented 3 years ago

Hi,

It seems that you are using multiple GPUs (because there is data_parallel called)? Can you confirm that you are using a single GPU?

liuh236 commented 3 years ago

Hi, The logger info is fowllows:

You are right I am using multiple GPUs. But the script (run_unsup_example.sh) is a single-GPU (or CPU) example for the unsupervised version as ReadMe describes, doesn't it? what happened?

gaotianyu1350 commented 3 years ago

If you want to use single-GPU, you should use CUDA_VISIBLE_DEVICES to control the GPU devices. If you use the single-GPU script on multiple GPUs, it will automatically use DataParallel and the batch will be sliced to different GPUs, leading to smaller negative sampling size. If you really want to use multiple GPUs, you should follow our guide/script for using distributed data parallel.

liuh236 commented 3 years ago

Thanks for your help! Finally, I get the right score ~76.

liuh236 commented 3 years ago

You did a nice job! I'm now closing the issue. Thanks again. 👍

liuleiBUAA commented 2 years ago

You did a nice job! I'm now closing the issue. Thanks again. 👍

hi I have the same problem, how to change the unsupervised code