Open ecstar opened 8 years ago
Sure it would be a nice enhancement to do in the future. The reason that we didn't design interface for batch labeling in the first place is that the algorithms we are implementing are not designed under that kind of setting. Maybe in the future we can start to include some batch-mode active learning algorithms.
Thanks.
+1 for batch query - this can become a game-changer in cases where crowdworkers are involved for labeling. Many if not most of the MTurk tasks are batch oriented.
However, after initial reading of the AAAI15 ALBL paper, I see what @yangarbiter meant by "algorithms [are not] designed under [batch labeling] setting." If I understand the paper correctly, the underlying algorithms model the multi-armed bandit problem (actually contextual bandit) and this formulation restrict the sampling step to choose a single instance of unlabeled example (i.e. the gambler chooses one arm).
This begs the question of other settings for the multi-armed bandit problem where the bandit can choose multiple arms at one time. Given the framing of the problem, I wonder if anyone has considered such variation to the formulation. If not, it would be very interesting to consider such formulation, paired with a simple crowdworker-backed labeling task.
@ecstar : if its not too much trouble for you, do you mind sharing your hack?
Since most of the query strategies are based on calculating a score for each instance and find the instance with the largest score to query. Thus in theory batch query can be done by selecting n top scored instances to query. But I don't think this may be a good way of doing batch query.
In certain applications, you might want to know what the top N unlabelled entities are so that a human can go through and do batch labeling offline. Right now I have a particularly hacky way of getting multiple results out, just assuming the majority class in the update, but it would be great to tweak the make_query function to return arbitrary numbers of ordered results for batch label processing. for i in range(20): item_to_investigate = qs.make_query() libact_ds.update(item_to_investigate, 0) print item_to_investigate
Happy to contribute code to try to help this happen!