Just brainstorming some options for task creation. There is certainly relevant literature on this that should be explored (e.g., in reinforcement learning)
I think there are two conditions to vary, each with multiple possible settings.
System extractions:
None
Full (model tuned for f-score)
Model tuned for recall (expect high false positives)
Include random additional annotations (seeded false positives, do annotators find them?)
Negative example coding
None (only reviewing true positives/false positives, which become negative examples.)
Windowed
nearby sentences
pages
page window based on empirical occurrence in gold standard training set
Just brainstorming some options for task creation. There is certainly relevant literature on this that should be explored (e.g., in reinforcement learning)
I think there are two conditions to vary, each with multiple possible settings.