Closed erinversfeldcodes closed 7 years ago
Hm, I suspect that your data is maybe not in the expected [n_samples, n_features]
format? In scikit-learn and also here, X_ should be a [n_samples, n_features]
array, and y_ should be a [n_samples, ]
array.
After executing x_train, x_test, y_train, y_test = train_test_split(X, Y, random_state=0)
, can you please provide the output of the following
for d in (x_train, x_test, y_train, y_test):
print(d.shape)
This would help a lot with better understanding what's going on.
I get AttributeError: 'list' object has no attribute 'shape'
But after casting everything to a numpy array like so:
for d in (x_train, x_test, y_train, y_test):
print(np.array(d).shape)
I get the following:
(93L, 858L)
(32L, 858L)
(93L,)
(32L,)
And if I then cast everything to a numpy array instead of a list, I get:
ValueError: shapes (32, 14) and (1386, 103) not aligned: 14 (dim 1) != 1386 (dim 0)
Which is at least a different error, if a little annoying...
Thanks for the extra info. I have a suspicion of what might be the problem, but could you share more details about the error message? E.g., the full error stack you are/were getting so that I could trace down the line of code that throws this error?
PS: I think this is a problem that is not unique to the EnsembleVoteClassifier. If you e.g., replace
ensemble = EnsembleVoteClassifier(clfs=[pipe1, pipe2], voting='soft', weights=[1, 1], verbose=2, refit=False)
ensemble.fit(list(x_train), list(y_train)) # this was giving the same error until I parsed x_train and y_train as lists
ensemble.predict(list(x_test)) # produces the error
ensemble.score(list(x_test), list(y_test)) # produces the error
by
from sklean.linear_model import LogisticRegression
pipe2 = make_pipeline(ColumnSelector(cols=(15, 16, 17, 18, 19, 20, 21, 22)), LogisticRegression())
pipe2.fit(list(x_train), list(y_train)) # this was giving the same error until I parsed x_train and y_train as lists
pipe2.predict(list(x_test)) # produces the error
pipe2.score(list(x_test), list(y_test)) # produces the error
Does the same error occur?
Traceback (most recent call last):
File "[path]/HonoursProject/Myo/__init__.py", line 259, in <module>
set_up()
File "[path]/HonoursProject/Myo/__init__.py", line 76, in set_up
ensemble = voting_ensemble_classifier(spatial_classifier, gestural_classifier)
File "[path]\HonoursProject\Myo\ensemble_classifiers\voting.py", line 34, in voting_ensemble_classifier
ensemble.score(np.array(x_test), np.array(y_test))
File "[path]\lib\site-packages\sklearn\base.py", line 349, in score
return accuracy_score(y, self.predict(X), sample_weight=sample_weight)
File "[path]\lib\site-packages\mlxtend\classifier\ensemble_vote.py", line 188, in predict
maj = np.argmax(self.predict_proba(X), axis=1)
File "[path]\lib\site-packages\mlxtend\classifier\ensemble_vote.py", line 221, in predict_proba
avg = np.average(self._predict_probas(X), axis=0, weights=self.weights)
File "[path]\lib\site-packages\mlxtend\classifier\ensemble_vote.py", line 263, in _predict_probas
return np.asarray([clf.predict_proba(X) for clf in self.clfs_])
File "[path]\lib\site-packages\sklearn\utils\metaestimators.py", line 54, in <lambda>
out = lambda *args, **kwargs: self.fn(obj, *args, **kwargs)
File "[path]\lib\site-packages\sklearn\pipeline.py", line 377, in predict_proba
return self.steps[-1][-1].predict_proba(Xt)
File "[path]\lib\site-packages\sklearn\neural_network\multilayer_perceptron.py", line 1016, in predict_proba
y_pred = self._predict(X)
File "[path]\lib\site-packages\sklearn\neural_network\multilayer_perceptron.py", line 676, in _predict
self._forward_pass(activations)
File "[path]\lib\site-packages\sklearn\neural_network\multilayer_perceptron.py", line 104, in _forward_pass
self.coefs_[i])
File "[path]\lib\site-packages\sklearn\utils\extmath.py", line 189, in safe_sparse_dot
return fast_dot(a, b)
ValueError: shapes (32,14) and (1386,103) not aligned: 14 (dim 1) != 1386 (dim 0)
Hm, thanks, but this is not as helpful as I thought :P. For debugging purposes, can you try the LogisticRegression
example suggested in my previous comment. And if that works, can you try using LogisticRegression
instead of the MLPClassifier in your pipe_1 and pipe_2; maybe it's a bug in the MLP.
And yeah, I get the same error if I try score the pipe using a logistic regression.
Traceback (most recent call last):
File "[path]/HonoursProject/Myo/__init__.py", line 257, in <module>
set_up()
File "[path]/HonoursProject/Myo/__init__.py", line 76, in set_up
ensemble_accuracy = voting_ensemble_classifier(spatial_classifier, gestural_classifier)
File "[path]\HonoursProject\Myo\ensemble_classifiers\voting.py", line 29, in voting_ensemble_classifier
test_pipe.fit(list(x_train), list(y_train))
File "[path]\lib\site-packages\sklearn\pipeline.py", line 268, in fit
Xt, fit_params = self._fit(X, y, **fit_params)
File "[path]\lib\site-packages\sklearn\pipeline.py", line 234, in _fit
Xt = transform.fit_transform(Xt, y, **fit_params_steps[name])
File "[path]\lib\site-packages\mlxtend\feature_selection\column_selector.py", line 44, in fit_transform
return self.transform(X=X, y=y)
File "[path]\lib\site-packages\mlxtend\feature_selection\column_selector.py", line 62, in transform
t = X[:, self.cols]
TypeError: list indices must be integers, not tuple
It's entirely possible that my use case is the problem, rather than the framework. I'm trying to combine two classifiers, each trained on a different data set. Both data sets describe the same thing, but using different measurements. The one data set is smaller than the other. I'm trying to see if I can get a more accurate model by combining the two using the EnsembleVoteClassifier.
Hm, both
should work. E.g., see the examples below:
import numpy as np
from sklearn.datasets import load_iris
from mlxtend.classifier import EnsembleVoteClassifier
from mlxtend.feature_selection import ColumnSelector
from sklearn.linear_model import LogisticRegression
from sklearn.pipeline import make_pipeline
iris = load_iris()
X = iris.data
y = iris.target
idx = np.arange(X.shape[0])
np.random.shuffle(idx)
X, y = X[idx], y[idx]
pipe1 = make_pipeline(ColumnSelector(cols=[0, 1]), LogisticRegression())
pipe2 = make_pipeline(ColumnSelector(cols=[2, 3]), LogisticRegression())
ens = EnsembleVoteClassifier(clfs=[pipe1, pipe2])
ens.fit(X, y)
ens.score(X, y)
pipe1 = make_pipeline(ColumnSelector(cols=[0, 1]), LogisticRegression())
pipe1.fit(X[:100], y[:100])
pipe2 = make_pipeline(ColumnSelector(cols=[2, 3]), LogisticRegression())
pipe2.fit(X[:50], y[:50])
ens = EnsembleVoteClassifier(clfs=[pipe1, pipe2], refit=False)
ens.fit(X, y)
ens.score(X, y)
I am currently not seeing what the issue might be in your case, but maybe it's somehow related to the format of your dataset or so. You can try to run the example above on your dataset after setting X = x_train
and y = y_train
and see if that runs okay to get a better picture of what's going on.
it looks like this error was caused by the format of the dataset, as you suggested. when combining the data into a single csv Python was adding blank lines everywhere 🤦♀️
Glad to hear that it's fixed now!
I have two trained classifiers which I am constructing n EnsembleVoteClassifier. I want to gauge the accuracy of this classifier, and so expect to be able to call score() using the test split of my data. However, I am having some issues in this regard. Specifically that calls to score and predict throw the TypeError specified in the title of this issues. The code is given below. Any ideas as to how I can resolve this?