Open jq-zhou opened 3 days ago
Hi @jq-zhou and thanks for the interest!
Definitely happy to look into this - do you mind clarifying a bit? I think my confusion is that both 3/5 and -3/5 are within the -1 to 1 range.
Thank you for your response.
Based on my understanding, p-MRR should be positive for items whose rank improves after instructions are applied, and negative for items whose rank decreases. Specifically:
In this way, if the overall score approaches 1, it would indicate strong instruction-following ability, while a score approaching -1 would suggest poor instruction-following ability.
Ah, is your question about the sign of the score? For the first case, if you start at R{og} = 5 and R{new}=2 (where the document is now ranked more relevant), you would expect a negative score.
This is because the documents that were changed in FollowIR w.r.t. the new instruction are no longer relevant, and so they should be ranked lower in the new. So if the rank goes up, it is doing the opposite of the instruction.
If I misunderstood your question, please let me know!
It seems I misunderstood your formula. I initially thought p-MRR was calculated using documents related to the instruction.
Thank you for clarifying this, and I really appreciate your help!
It seems I misunderstood your formula. I initially thought p-MRR was calculated using documents related to the instruction.
You are right though, it is calculated using those - sorry if I was not clear. So say you have five relevant documents and two have been changed to be non-relevant (newly non-relevant) in the new instruction setting. You would loop over the newly non-relevant documents (the two) and calculate p-MRR for each one, then average over all of them for that query score (and then average over all queries for the final score).
Definitely feel free to ask any other clarifying questions, this is also great feedback for me to update the paper to make it more clear :)
Hi, Thank you for your valuable work! I have a questions regarding the p-MRR formula presented in your paper.
In the paper, you mention that the normalized range for p-MRR is from the worst possible change (i.e., -1) to the best possible change (i.e., 1). However, during actual calculations, I noticed some inconsistencies:
These results seem to contradict the normalized range you mentioned. Could you clarify whether there might be an issue with the formula, or if there's something I'm missing in my understanding of the calculation?