Open NeilduToit13 opened 3 years ago
why is your y_train.shape (84, 51)? is it a multi label problem? is it a multi task problem? or is it a multi class problem and you are using one hot encoding?
I tried to reproduce your error.
I assumed you were in a multi label problem and used make_multilabel_classification
to build a synthetic classification dataset.
here is the code I used:
!python --version
!pip install --quiet pandas==1.1.4
!pip install --quiet numpy==1.16.0
!pip install --quiet sklearn==0.0
!pip install --quiet modAL==0.4.0
from modAL.models import ActiveLearner
from sklearn.tree import DecisionTreeClassifier
from sklearn.datasets import make_multilabel_classification
X, y = make_multilabel_classification(n_samples = 1081, n_features=4926, n_classes=51, n_labels=3,allow_unlabeled=False,random_state=1)
X_train, X_POOL, y_train, y_POOL = X[:84], X[84:], y[:84], y[84:]
clf = DecisionTreeClassifier(max_depth=4)
print("Beginning debugging logs:")
print(f"classifier: {clf}")
print(f"X_train shape: {X_train.shape}")
print(f"y_train shape: {y_train.shape}")
print(f"X_pool shape: {X_POOL.shape}")
learner = ActiveLearner(
estimator=clf,
X_training=X_train,
y_training=y_train
)
result = learner.query(X_POOL)
and here is the error I get:
Python 3.8.10
WARNING: Running pip as root will break packages and permissions. You should install packages reliably by using venv: https://pip.pypa.io/warnings/venv
WARNING: Running pip as root will break packages and permissions. You should install packages reliably by using venv: https://pip.pypa.io/warnings/venv
WARNING: Running pip as root will break packages and permissions. You should install packages reliably by using venv: https://pip.pypa.io/warnings/venv
WARNING: Running pip as root will break packages and permissions. You should install packages reliably by using venv: https://pip.pypa.io/warnings/venv
Beginning debugging logs:
classifier: DecisionTreeClassifier(max_depth=4)
X_train shape: (84, 4926)
y_train shape: (84, 51)
X_pool shape: (997, 4926)
/usr/local/lib/python3.8/site-packages/numpy/core/fromnumeric.py:87: VisibleDeprecationWarning: Creating an ndarray from ragged nested sequences (which is a list-or-tuple of lists-or-tuples-or ndarrays with different lengths or shapes) is deprecated. If you meant to do this, you must specify 'dtype=object' when creating the ndarray
---------------------------------------------------------------------------
ValueError Traceback (most recent call last)
<ipython-input-1-b598ae292361> in <module>
26 )
27
---> 28 result = learner.query(X_POOL)
/usr/local/lib/python3.8/site-packages/modAL/models/base.py in query(self, X_pool, *query_args, **query_kwargs)
259 labelled upon query synthesis.
260 """
--> 261 query_result = self.query_strategy(self, X_pool, *query_args, **query_kwargs)
262
263 if isinstance(query_result, tuple):
/usr/local/lib/python3.8/site-packages/modAL/uncertainty.py in uncertainty_sampling(classifier, X, n_instances, random_tie_break, **uncertainty_measure_kwargs)
150 the instances from X chosen to be labelled.
151 """
--> 152 uncertainty = classifier_uncertainty(classifier, X, **uncertainty_measure_kwargs)
153
154 if not random_tie_break:
/usr/local/lib/python3.8/site-packages/modAL/uncertainty.py in classifier_uncertainty(classifier, X, **predict_proba_kwargs)
80
81 # for each point, select the maximum uncertainty
---> 82 uncertainty = 1 - np.max(classwise_uncertainty, axis=1)
83 return uncertainty
84
<__array_function__ internals> in amax(*args, **kwargs)
/usr/local/lib/python3.8/site-packages/numpy/core/fromnumeric.py in amax(a, axis, out, keepdims, initial, where)
2703 sub-class' method does not implement `keepdims` any
2704 exceptions will be raised.
-> 2705 initial : scalar, optional
2706 The starting value for this product. See `~numpy.ufunc.reduce` for details.
2707
/usr/local/lib/python3.8/site-packages/numpy/core/fromnumeric.py in _wrapreduction(obj, ufunc, method, axis, dtype, out, **kwargs)
85
86 return ufunc.reduce(obj, axis, dtype, out, **passkwargs)
---> 87
88
89 def _take_dispatcher(a, indices, axis=None, out=None, mode=None):
ValueError: could not broadcast input array from shape (997,2) into shape (997)
so I do not get the exact same error as you which is weird. but long story short, if you are dealing with a multi label problem you should use modal.multilabel. and if you are in a multi class setting with labels one hot encoded, you should revert to one dimensional labels.
Apologies for posting here. I've been unable to fix this error after two days and checking with StackOverflow. Hoping you guys will have an idea what I've done wrong? Thanks.
Here is the traceback:
The environment is Python3.8 with: