juliema / label_reconciliations

Code for reconciling multiple transcriptions for a label
MIT License
26 stars 11 forks source link

Add ability to trust or distrust transcriptions from specific user IDs. #51

Closed CapPow closed 6 years ago

CapPow commented 6 years ago

This adds the ability to pass users names, and corresponding score modifiers when reconciling text fields. The added function might be useful if the project has a proven, reliable power user or a consistently inaccurate user. Please review the bold text while considering the request.

rafelafrance commented 6 years ago

This looks both useful and sound. I'm going to have to run it by the product owner (@juliema) and I have some datasets that I can use to look for meaningful differences in output.

PmasonFF commented 6 years ago

If I am reading this correctly trusted (or distrusted) user weights are hardcoded in line 14 of userWeightedText.py. Firstly this sort of scheme is only as good as the ability to determine valid weights for users, and hardcoding these seems to be a rather odd way to pass them to the code.

My limited experience with weighting is it adds complexity, with very little improvement in overall accuracy, losses transparency and is a step down a slippery slope to data manipulation. This is admittedly less of an issue for transcriptions where there is a "correct" answer. Would this apply to only NfN transcriptions or all reconciliations including csv format? Peter

CapPow commented 6 years ago

The userWeightedText.py, and the hardcoded values are not a part of the pull request. Having those commits in history is a result of my git inexperience, sorry for the confusion. The feature was rolled into "text.py" using the optional argument "--user-weights." The "Files Changed" tab at the top of this request should provide a more concise view of the proposed changes.

I agree, selecting appropriate values is problematic. This was written as an option for known outliers on some of our projects. For example, if a transcriber has uniformly misinterpreted a field such as by entering annotators' names as collectors.

When the script is run using the "--user-weights" argument, these changes would impact all "free text" fields which are chosen by the top_partial_ratio (aka: "simple in-place fuzzy matches"). If it is determined appropriate for adoption, it could be implemented for top_token_set_ratio (aka: "best token" matches).

CapPow commented 6 years ago

Thank you everyone for the productive feedback. I'll be happy to tackle points 1-4.

rafelafrance commented 6 years ago

@PmasonFF Your points are well taken. I tend to err on the side of making code go away and transparency is always important. We want to experiment with biasing the selection of toward some known users -- & there are some really good & productive transcribers. This has been talked about for a while. The point system seems simple enuf and I think it may do what the PhDs want it to do.

CapPow commented 6 years ago

I believe this addresses the feedback. Please note: I also changed summary.py to address an encoding error I encountered on windows. I would appreciate it being checked in other environments to make sure it does not break anything!

rafelafrance commented 6 years ago

All issues have been addressed and there were no issues during testing. I particularly like the use of the append=True argument on set_index. TIL.