Adding the ability to specify classes to filter user-skill calculation

AgentM-GEG commented 1 month ago

Context: The current version of the user-skill calculation is done on ALL detected classes present within a task (either the mean skill or that skill for all classes be above a certain skill_threshold). This creates a situation where, for a task with large number of classes OR imbalanced datasets, the user has to see at least N images per class before they get a chance to even be considered for leveling up.

Motivation: Research teams should be given the opportunity to provide specific classes using which they can judge the leveling up decision.

This PR:

The user_skill_reducer function now takes in focus_classes argument (default: None).
The solution involved a conditional statement saying if focus_classes are provided (e.g., ['square', 'triangle']), then compute the mean_skill, null_removed_classes, and null_removed_class_counts on these subset classes (instead of everything).
As such, the output still contains the entire confusion matrix (for all classes), but, the mean skill is computed on user-specified classes only.
A refactoring on lines 87-89 were done as this block of code is just repeated between if binary... else: ... statement, with the only difference being the null_class='False' in the binary case.

An example caeasar config looks as such: .../reducers/user_skill_reducer?mode='one-to-one'&count_threshold=5&focus_classes=['1', '2']&strategy='all'&skill_threshold=0.2

AgentM-GEG commented 1 month ago

tagging @ramanakumars as well for visibility and crosschecking.

lcjohnso commented 1 month ago

Hi @CKrawczyk -- Would you mind reviewing this PR?

AgentM-GEG commented 3 weeks ago

@CKrawczyk I added a test for the focus_classes behavior and pushed those changes. I also changed a little bit of the reducer_wrapper code where the focus_classes argument is being parsed appropriately. Let me know how these changes look.

AgentM-GEG commented 3 weeks ago

@CKrawczyk @lcjohnso , thank you! I am happy for it to be merged whenever works for either/both of you.

zooniverse / aggregation-for-caesar

Adding the ability to specify classes to filter user-skill calculation #796