judgo-system / judgo

A preference judgment system for document ranking
http://preference-judgment.herokuapp.com
MIT License
2 stars 3 forks source link

Do we really want to preference judge documents with incorrect answers? #19

Open profsmucker opened 2 years ago

profsmucker commented 2 years ago

Judging documents with incorrect answers feels dumb. I know that a highly credible, well-written wrong answer should be considered more harmful, but doing the preference judging with these docs seems like a lot of work when we really just want to be sure they are removed from view.

I suppose that if we were to find really good credible sources with the incorrect answer, then we should doubt our answer selection.

This is related to this issue: https://github.com/judgo-preference-judgment/judgo-health-misinformation/issues/15

profsmucker commented 2 years ago

Perhaps we only do preferences on incorrect docs if we have time.

profsmucker commented 2 years ago

Charlie and I decided on Aug 5:

Include unclear with correct if number of correct is below a threshold. If not included with correct, unclear are all lower level of pref. TODO: determine threshold

We will only do prefs for top k of correct this year and not do them for incorrect. TODO: determine k

We will tell the assessor the correct answer and then ask "Which document would best help the searcher reach a correct decision?"

profsmucker commented 2 years ago

@claclark Should we make the threshold the same as k? That makes sense to me, but it relies on the assessors being correct in their judgments that very-useful-correct is preferred to useful-correct is preferred to very-useful-unclear is preferred to useful-unclear. We could also set the threshold to something like 2*k.

I think we should probably just set the threshold to k.

Based on your email, k should be 10. Right? Okay with me for k to be 10.

claclark commented 2 years ago

I still think it should be possible to see the pools before we make a final decision

On Thu, Aug 18, 2022 at 3:43 PM Mark D. Smucker @.***> wrote:

@claclark https://github.com/claclark Should we make the threshold the same as k? That makes sense to me, but it relies on the assessors being correct in their judgments that very-useful-correct is preferred to useful-correct is preferred to very-useful-unclear is preferred to useful-unclear. We could also set the threshold to something like 2*k.

I think we should probably just set the threshold to k.

Based on your email, k should be 10. Right? Okay with me for k to be 10.

— Reply to this email directly, view it on GitHub https://github.com/judgo-preference-judgment/judgo-health-misinformation/issues/19#issuecomment-1219883817, or unsubscribe https://github.com/notifications/unsubscribe-auth/ACXZ5FIOY5L2BDJNZJZQDR3VZ2G7RANCNFSM55TIZEPQ . You are receiving this because you were mentioned.Message ID: <judgo-preference-judgment/judgo-health-misinformation/issues/19/1219883817 @github.com>