Closed seth814 closed 8 years ago
Hi, seth814,
hope we can figure on what's going on here in your case. I think the easiest way would be if you could upload your script (if you are okay with sharing it) so that I can take a look and inspect what's going on inside the plot_decision_regions
function that may cause this behavior on your dataset.
One thing that I could think of may be non-supported input shapes of the numpy arrays "X" and "y" in plot_decision_regions(X, y, classifier, resolution=0.02)
. This decision region plotting functions expects these X
and y
in the shape that scikit-learn works with. E.g., the y-array has to be a 1D integer type array. And X
has to be a 2D
float (or integer) type array.
Would be nice if you could check your input data and let me know what the result of the "print" functions (see below) looks like -- that would be very helpful
Input:
import numpy as np
y = np.array([1, 2, 0, 0, 2])
X = np.array([[1., 2.],
[3., 4.],
[5., 6.],
[8., 9.],
[7., 8.]])
print('y:', y.shape, y.dtype)
print('X:', X.shape, X.dtype)
Output:
y: (5,) int64
X: (5, 2) float64
Above is an example of how the expected shape may look like.
PS: I have a slightly more sophisticated function implemented here: http://rasbt.github.io/mlxtend/user_guide/evaluate/plot_decision_regions/
I am currently a bit busy (at SciPy 2016), but several people asked me about 3D decision spaces recently, which I am going to add soon!
No worries. I'm not in a huge hurry, but I am curious as to what is going on in the function.
I uploaded the data and file under Vertical Abduction in my repo. I tried to upload a zip but it said the format wasn't supported. The shapes and datatypes are both correct so it's probably something else.
About the FutureWarning
, I think that's not an issue here; it comes from the fact that y_train
is a Pandas DataFrame, not a NumPy array. I'd just recommend putting a y_train = y_train.values
into your code.
Hm, about the plot itself, I don't think this is a bug. This is how the decision region of the SVM looks like in this case -- you may want to do some hyperparameter tuning here. E.g., when I plot the first 2 dimensions of the input data, it kind of looks like this:
plt.scatter(X_train.values[:, 0], X_train.values[:, 1])
So, for a more visually pleasing analysis, you could maybe try a non-linear dimensionality reduction technique (e.g., Kernel SVM or other algorithms for manifold learning that are implemented in scikit-learn: http://scikit-learn.org/stable/modules/manifold.html)
Sebastian,
I've been collecting my own data and have applied the plot_decision_regions function several times to my data but I am running into a problem with this new data. The problem is occurring here:
My enumerated object is: [(0, 0), (1, 1), (2, 2), (3, 3), (4, 4)] So 5 classifications hot encoded.
From what I understand, this list comprehension passes over my X_train_pca data five times and uses the boolean comparison y == cl to plot all my data points with five different colors as it passes through the markers and colormap.
Upon running, I get the warning:
FutureWarning: in the future, boolean array-likes will be handled as a boolean array index plt.scatter(x=X[y == cl, 0], y=X[y == cl, 1],
The really weird part is the values in the array: X[y==cl, 0] They now look like: [-0.4277726 -0.4277726 -0.44362509 ..., -0.4277726 -0.4277726 -0.4277726 ] With shape (9784,) which is the original length of my X_train_pca data. (I believe it should be closer to about a fifth since most of my data is similar in length and I checked np.shape after the loop ran.)
To give a visual my data looks like this.
When it should be separated into colors with a spread looking like this.
I can't really think through the problem anymore probably due to a misunderstanding of what this future warning is trying to tell me. I am wondering if you have any ideas as to what might cause this behavior.