Closed tianylin98 closed 3 years ago
Hi,
For the unsupervised model, you should use --pooler cls_before_pooler
for the evaluation.
Thanks for the reply. The numbers look normal now.
@boredtylin
Thanks for the reply. The numbers look normal now.
Do you mind letting me know what numbers you got?
@boredtylin
Thanks for the reply. The numbers look normal now.
Do you mind letting me know what numbers you got?
for example:
+-------+-------+-------+-------+-------+--------------+-----------------+-------+
| STS12 | STS13 | STS14 | STS15 | STS16 | STSBenchmark | SICKRelatedness | Avg. |
+-------+-------+-------+-------+-------+--------------+-----------------+-------+
| 70.06 | 81.01 | 75.25 | 82.06 | 76.53 | 77.01 | 71.24 | 76.17 |
+-------+-------+-------+-------+-------+--------------+-----------------+-------+
@boredtylin I used the command they provided. I set CUDA_VISIBLE_DEVICES=0 when running the bash command. I also changed the pooler option to --pooler_before_cls
. Do you know what else we need to change to get the result? I appreciate your help.
@boredtylin I used the command they provided. I set CUDA_VISIBLE_DEVICES=0 when running the bash command. I also changed the pooler option to
--pooler_before_cls
. Do you know what else we need to change to get the result? I appreciate your help.
I think your results are normal. The evaluation does show some variation in the results. Even with the same hyper-parameters, i do sometimes get results lower than yours (e.g., Avg=73). I think the authors only report the result for the best random seed.
You might wanna try varying the random seed by setting --seed
in the training script (e.g., run_unsup_example.sh
)
Hi, I tried running the script in this repo with the following command
after the model is trained, the automatic evaluation shows the following results:
I believe the results so far look pretty normal.
But when I evaluate the checkpoint using the recommended command,
the numbers look a little abnormal:
For comparison, I also run evaluation using the higgingface-hosted checkpoint:
Did I do something wrong / misunderstand something here?