modAL-python / modAL

A modular active learning framework for Python
https://modAL-python.github.io/
MIT License
2.24k stars 324 forks source link

Issue when using Committee.teach(X) on a learner initialized without data #97

Closed OskarLiew closed 4 years ago

OskarLiew commented 4 years ago

Hello,

When trying to implement a cold-start query strategy for my AL application I encoutered a bug with the Committee class.

A minimum example to show this working for an ActiveLearner and not for a Committee is:

import numpy as np
from sklearn.svm import SVC
from modAL.uncertainty import uncertainty_sampling
from modAL.disagreement import vote_entropy_sampling
from modAL.models import ActiveLearner, Committee

# Create data
X = np.array([[0, 0],
              [0, 1],
              [1, 0],
              [1, 1]])

y = np.array([1, 1, 0 ,0])

X_teach = X[:3]
y_teach = y[:3]

X_query = X[3].reshape((-1, 2))
y_query = y[3]

# ActiveLearner
svm = SVC(probability=True)
learner = ActiveLearner(svm, uncertainty_sampling)

learner.teach(X_teach, y_teach)

query_idx, query_inst = learner.query(X_query)

print('Index %d, Coordinates %s' % (query_idx, query_inst))

# Committee
estimators = [SVC(probability=True) for _ in range(20)]
learner_list = [ActiveLearner(estimator, uncertainty_sampling) for estimator in estimators]
committee = Committee(learner_list, vote_entropy_sampling)

committee.teach(X_teach, y_teach)

query_idx, query_inst = committee.query(X_query)

print('Index %d, Coordinates %s' % (query_idx, query_inst))

Gives output:

Index 0, Coordinates [[1 1]] Traceback (most recent call last): File "c:/Users/OskarLiew/Documents/Tests/modAL test/modAL_issue.py", line 38, in query_idx, query_inst = committee.query(X_query) File "C:\Users\OskarLiew\Documents\Tests\modAL test\env\lib\site-packages\modAL\models\base.py", line 319, in query query_result = self.query_strategy(self, *query_args, query_kwargs) File "C:\Users\OskarLiew\Documents\Tests\modAL test\env\lib\site-packages\modAL\disagreement.py", line 124, in vote_entropy_sampling disagreement = vote_entropy(committee, X, disagreement_measure_kwargs) File "C:\Users\OskarLiew\Documents\Tests\modAL test\env\lib\site-packages\modAL\disagreement.py", line 37, in vote_entropy pvote = np.zeros(shape=(X.shape[0], len(committee.classes))) TypeError: object of type 'NoneType' has no len()

The problem is that when updating the training data in BaseCommittee.teach() , Committee._set_classes() is run before the models have been trained, which results in a list of trained estimators, but Committee.classes_ is still None.

I could fix this by moving Committee._set_classes() from Committee._add_training_data() to a new function Committee.teach() where it is run after the data has been added and the models trained. ~I will create a pull request soon, so you can look at this~.

Edit: This fix has already been implemented

OskarLiew commented 4 years ago

I saw that this issue was fixed in the source code. In the same way as I did, no less. I have version 0.3.5 of modAL that was installed via pip version 20.2.2 and python version 3.7.

Is it possible to upload this fix to pip?

cosmic-cortex commented 4 years ago

Hi! Sorry for the issue, I haven't made a new release a long time ago :( I have created the release 0.3.6 with all the fixes since the previous release. It should be available in PyPI now. I'll close the issue, but feel free to open it up if there is something wrong in 0.3.6!

OskarLiew commented 4 years ago

Hello, me again.

Stumbled upon this issue again, but this time with the fit function. The problem is still that committee.classes_ doesn't get initialized unless the committee is itself initialized with data of all classes. This time I didn't find the fix in the code, so I will send a pull request shortly.