modAL-python / modAL

A modular active learning framework for Python
https://modAL-python.github.io/
MIT License
2.24k stars 324 forks source link

ValueError: could not broadcast input array from shape (997,1) into shape (997) #129

Open NeilduToit13 opened 3 years ago

NeilduToit13 commented 3 years ago

Apologies for posting here. I've been unable to fix this error after two days and checking with StackOverflow. Hoping you guys will have an idea what I've done wrong? Thanks.

from modAL.models import ActiveLearner

# ... fetch data

print("Beginning debugging logs:")
print(f"classifier: {clf}")
print(f"X_train shape: {X_train.shape}")
print(f"y_train shape: {y_train.shape}")
print(f"X_pool shape: {X_POOL.shape}")

learner = ActiveLearner(
    estimator=clf,
    X_training=X_train,
    y_training=y_train
    )

result = learner.query(X_POOL)

Here is the traceback:

classifier: DecisionTreeClassifier(max_depth=4)
X_train shape: (84, 4926)
y_train shape: (84, 51)
X_pool shape: (997, 4926)

  File "rpc_server.py", line 139, in <module>
    result = learner.query(X_POOL)
  File "/home/ubuntu/venv/lib/python3.8/site-packages/modAL/models/base.py", line 261, in query
    query_result = self.query_strategy(self, X_pool, *query_args, **query_kwargs)
  File "/home/ubuntu/venv/lib/python3.8/site-packages/modAL/uncertainty.py", line 152, in uncertainty_sampling
    uncertainty = classifier_uncertainty(classifier, X, **uncertainty_measure_kwargs)
  File "/home/ubuntu/venv/lib/python3.8/site-packages/modAL/uncertainty.py", line 82, in classifier_uncertainty
    uncertainty = 1 - np.max(classwise_uncertainty, axis=1)
  File "/home/ubuntu/venv/lib/python3.8/site-packages/numpy/core/fromnumeric.py", line 2504, in amax
    return _wrapreduction(a, np.maximum, 'max', axis, None, out, keepdims=keepdims,
  File "/home/ubuntu/venv/lib/python3.8/site-packages/numpy/core/fromnumeric.py", line 86, in _wrapreduction
    return ufunc.reduce(obj, axis, dtype, out, **passkwargs)
ValueError: could not broadcast input array from shape (997,1) into shape (997)

The environment is Python3.8 with:


pandas==1.1.4
numpy==1.16.0
sklearn==0.0
modAL==0.4.0
damienlancry commented 3 years ago

why is your y_train.shape (84, 51)? is it a multi label problem? is it a multi task problem? or is it a multi class problem and you are using one hot encoding?

damienlancry commented 3 years ago

I tried to reproduce your error. I assumed you were in a multi label problem and used make_multilabel_classification to build a synthetic classification dataset. here is the code I used:

!python --version
!pip install --quiet pandas==1.1.4
!pip install --quiet numpy==1.16.0
!pip install --quiet sklearn==0.0
!pip install --quiet modAL==0.4.0

from modAL.models import ActiveLearner
from sklearn.tree import DecisionTreeClassifier
from sklearn.datasets import make_multilabel_classification

X, y = make_multilabel_classification(n_samples = 1081, n_features=4926, n_classes=51, n_labels=3,allow_unlabeled=False,random_state=1)
X_train, X_POOL, y_train, y_POOL = X[:84], X[84:], y[:84], y[84:]
clf = DecisionTreeClassifier(max_depth=4)

print("Beginning debugging logs:")
print(f"classifier: {clf}")
print(f"X_train shape: {X_train.shape}")
print(f"y_train shape: {y_train.shape}")
print(f"X_pool shape: {X_POOL.shape}")

learner = ActiveLearner(
    estimator=clf,
    X_training=X_train,
    y_training=y_train
    )

result = learner.query(X_POOL)

and here is the error I get:


Python 3.8.10
WARNING: Running pip as root will break packages and permissions. You should install packages reliably by using venv: https://pip.pypa.io/warnings/venv
WARNING: Running pip as root will break packages and permissions. You should install packages reliably by using venv: https://pip.pypa.io/warnings/venv
WARNING: Running pip as root will break packages and permissions. You should install packages reliably by using venv: https://pip.pypa.io/warnings/venv
WARNING: Running pip as root will break packages and permissions. You should install packages reliably by using venv: https://pip.pypa.io/warnings/venv
Beginning debugging logs:
classifier: DecisionTreeClassifier(max_depth=4)
X_train shape: (84, 4926)
y_train shape: (84, 51)
X_pool shape: (997, 4926)
/usr/local/lib/python3.8/site-packages/numpy/core/fromnumeric.py:87: VisibleDeprecationWarning: Creating an ndarray from ragged nested sequences (which is a list-or-tuple of lists-or-tuples-or ndarrays with different lengths or shapes) is deprecated. If you meant to do this, you must specify 'dtype=object' when creating the ndarray

---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-1-b598ae292361> in <module>
     26     )
     27 
---> 28 result = learner.query(X_POOL)

/usr/local/lib/python3.8/site-packages/modAL/models/base.py in query(self, X_pool, *query_args, **query_kwargs)
    259             labelled upon query synthesis.
    260         """
--> 261         query_result = self.query_strategy(self, X_pool, *query_args, **query_kwargs)
    262 
    263         if isinstance(query_result, tuple):

/usr/local/lib/python3.8/site-packages/modAL/uncertainty.py in uncertainty_sampling(classifier, X, n_instances, random_tie_break, **uncertainty_measure_kwargs)
    150         the instances from X chosen to be labelled.
    151     """
--> 152     uncertainty = classifier_uncertainty(classifier, X, **uncertainty_measure_kwargs)
    153 
    154     if not random_tie_break:

/usr/local/lib/python3.8/site-packages/modAL/uncertainty.py in classifier_uncertainty(classifier, X, **predict_proba_kwargs)
     80 
     81     # for each point, select the maximum uncertainty
---> 82     uncertainty = 1 - np.max(classwise_uncertainty, axis=1)
     83     return uncertainty
     84 

<__array_function__ internals> in amax(*args, **kwargs)

/usr/local/lib/python3.8/site-packages/numpy/core/fromnumeric.py in amax(a, axis, out, keepdims, initial, where)
   2703         sub-class' method does not implement `keepdims` any
   2704         exceptions will be raised.
-> 2705     initial : scalar, optional
   2706         The starting value for this product. See `~numpy.ufunc.reduce` for details.
   2707 

/usr/local/lib/python3.8/site-packages/numpy/core/fromnumeric.py in _wrapreduction(obj, ufunc, method, axis, dtype, out, **kwargs)
     85 
     86     return ufunc.reduce(obj, axis, dtype, out, **passkwargs)
---> 87 
     88 
     89 def _take_dispatcher(a, indices, axis=None, out=None, mode=None):

ValueError: could not broadcast input array from shape (997,2) into shape (997)

so I do not get the exact same error as you which is weird. but long story short, if you are dealing with a multi label problem you should use modal.multilabel. and if you are in a multi class setting with labels one hot encoded, you should revert to one dimensional labels.