Return confidence score for query samplers

modAL-python / modAL

A modular active learning framework for Python

https://modAL-python.github.io/

MIT License

2.23k stars 324 forks source link

Return confidence score for query samplers #55

Open AlexandreAbraham opened 5 years ago

AlexandreAbraham commented 5 years ago

Dear modAL team, I am trying to use the strategies of modAL outside of the ActiveLearner and I would like to use confidence score but they are not returned by query sampling functions. For example, uncertainty_sampling return the index of the samples, the samples, but not the scores associated to each of them. Do you think that a kwarg such as return_scores=False (similar to return_proba for predictor in some estimators) that adds the scores in the returned tuple could be a good idea?

Thanks for your feedback.

cosmic-cortex commented 5 years ago

Hi!

I haven't planned this feature, but this would be useful indeed. Meanwhile, if you need this feature urgently, the individual query strategies can be easily modified to achieve this. For example:

def uncertainty_sampling_mod(classifier: BaseEstimator, X: modALinput, n_instances: int = 1) -> Tuple[np.ndarray, modALinput]:
    uncertainty = classifier_uncertainty(classifier, X)
    query_idx = multi_argmax(uncertainty, n_instances=n_instances)

    return query_idx, X[query_idx], uncertainty[query_idx]

This returns the uncertainties in every case. Upon initialization, you can simply specify this function as a query strategy, and this will work. (I haven't tested this however :) )

AlexandreAbraham commented 5 years ago

Dear Tivadar,

Thanks for the tip. I have indeed modified the code to fit my needs. My question is: Would you be interested by a PR on this and if so, what design would you prefer.

cosmic-cortex commented 5 years ago

Sorry for the delayed answer, I was extremely busy in the past few days. I am not sure what would be the good design for this feature. One alternative would be the return_scores=True keyword argument, but I really don't like the pattern where the return value of the function can have more than one types, like a tuple of 2 and a tuple of 3. Do you have any ideas for this?