Closed lephong closed 2 years ago
The uniformity will drop very fast from the beginning. Can you specify what is your initialization and what's the stride to calculate the uniformity?
I didn't change anything else except adding some lines to calculate the alignment and uniformity (as mentioned before). More specifically, from run_unsup_example.sh
python train.py \
--model_name_or_path bert-base-uncased \
--train_file data/wiki1m_for_simcse.txt \
--output_dir result/my-unsup-simcse-bert-base-uncased \
--num_train_epochs 1 \
--per_device_train_batch_size 64 \
--learning_rate 3e-5 \
--max_seq_length 32 \
--evaluation_strategy steps \
--metric_for_best_model stsb_spearman \
--load_best_model_at_end \
--eval_steps 125 \
--pooler_type cls \
--mlp_only_train \
--overwrite_output_dir \
--temp 0.05 \
--do_train \
--do_eval \
--fp16 \
For initialisation, I didn't change random seed. So I guess it's 42 from huggingface (don't know, maybe wrong).
If I understand correctly, you calculate the alignment/uniformity every 125 step (the same as validation). In the original paper, we calculate every 10 step, because as I mentioned, the uniformity drops very fast at the beginning of the training.
ah, so you mean every 10 update steps / batches? I thought it was every 10 * 125 batches.
But even if that's the case, I'm not sure if figure 2 provides a good explanation here because after 125 steps (or 12 little red stars in figure 2), the accuracy (on STSB dev) is only around 60%, which is much lower than 82.5% in the paper. So, I think you can use fig 2 to explain what happens in the very first training phase, but then, the gap of 82.5 - 60 = 22.5% is not explained.
You can use Figure 3 as a reference (although it's not a rigorous comparison because we didn't put CLS BERT representation, which is the initialization for SimCSE into the figure), and it's the uniformity that makes a huge difference.
that makes sense. thanks
I'm following Wang and Isola to compute alignment and uniformity (using their given code in Fig 5, http://proceedings.mlr.press/v119/wang20k/wang20k.pdf) to reproduce Fig 2 in your paper but fail. What I saw is that the alignment decreases whereas the uniformity is almost unchanged, which is completely different from Fig 2. Details are below.
To compute alignment and uniformity, I changed line 66-79 file SimCSE/blob/main/SentEval/senteval/sts.py by adding the code from Wang and Isola:
The output (which also shows spearman on stsb dev set) is
We can see that alignment drops from 0.26 to less than 0.20 whereas uniformity is still around -2.55. It means that reducing alignment is key, not uniformity. This trend is completely different from Fig 2.
Did you also use the code from Wang and Isola like I did? If possible, could you please provide the code for reproducing alignment and uniformity?