Inconsistencies in ideal_labeler.py

ntucllab / libact

Pool-based active learning in Python

http://libact.readthedocs.org/

BSD 2-Clause "Simplified" License

777 stars 175 forks source link

Inconsistencies in ideal_labeler.py #159

Closed bngksgl closed 5 years ago

bngksgl commented 5 years ago

Hi,

When i look at the def label(self, feature) function in ideal_labeler.py, i see that it receives the features from X_train and returns the label from y_train. It is doing this by finding records that have the same features passed from X_train, and it takes the first occurrence, and returns the first occurance records' label. This results with inconsistencies if there are multiple records that have the same entries. Wouldn't be better if you take the position of the queried id? Or what is the logic behind it?

def label(self, query_id): return self.y[query_id]

yangarbiter commented 5 years ago

The assumption of ideal_labeler class is that the same feature will have the same label, which is quite realistic in most cases. ideal_labeler is just mimicking the scenario where you have a feature vector and you want to acquire the label of that specific feature vector, so the input is feature vector instead of query_id If your scenario really need to index the labels, you can always override the current label method with what you just wrote.

bngksgl commented 5 years ago

Thank you for your quick response, i changed it in my code :)

hsuantien commented 4 years ago

In [https://github.com/ntucllab/libact/commit/c3babc89ee4cc7672539de46f5a9267d16dccc1b] we changed the behavior of ideal_labeler to randomly sample one label when there are multiple choices.