Closed taeminlee closed 1 day ago
Hi @taeminlee, thank you for your further work based on CLIcK! We actually measured the highest log likelihood of not only alphabet but concatenation of alphabet and option string ("{alphabet}. {option_string}"
).
For polyglot 1.3b, I think the model is prone to positional bias because it's too small. Maybe this might be the reason for the discrepancy between the performance of measuring the log likelihood of alphabet only and alpabet+option_string.
Thank you for your response. I will change the format of the choices and run it again. One more thing I'm curious about is, since the prompt in the paper asks for an answer in a single letter, the prompt wording also needs to be changed. Below is a prompt created with reference to a paper, and I am wondering how it should be modified.
주어진 질문을 천천히 읽고, 질문에 대한 적절한 정답을 A, B, C, D중에 골라 알파벳 하나로 답하시오.\n (Read the given Question, and choose the correct answer from options A, B, C, or D. Respond with a single alphabet.)\n 질문(Question):
Despite measuring the highest log likelihood of "{alphabet}. {option_string}", We used the same prompt. If you want to revise the original evaluation procedure, possible methods could be
Hello, I am attempting to reproduce the results based on the pape using lm-evaluation-harness framework for this reproduction. I utilized the prompts from Chapter 4 of the paper and selected the answer option with the highest log-likelihood of the alphabet.
detailed implementation method here.
Using Polyglot 1.3B for the measurement, I obtained the following results:
These results are somewhat similar to the ones you shared, but there are slight differences.
I included both Korean and English in the prompt section, and I am wondering if this caused the issue.
I have attached the execution logs for your review.
Thank you.