Closed Juned-Ansari closed 3 years ago
I don't think we can support this type. Is it possible to convert it to numpy array?
Indeed, the solution was simply
from scipy.sparse import csr_matrix
X_train = X_train.toarray()
as pointed out in this (accepted) SO answer on the exact same issue: https://stackoverflow.com/questions/66495126/typeerror-unsupported-type-class-scipy-sparse-csr-csr-matrix-for-structured/66614090#66614090
We can safely close this now @haifeng-jin.
Bug Description
Unsupported type <class 'scipy.sparse.csr.csr_matrix'> for StructuredDataAdapter. When I Convert Multiple Text column into Vectors it will convert into csr_matrix which is not supported by auto-keras.
Bug Reproduction
using trasnformers
from sklearn.compose import ColumnTransformer from sklearn.feature_extraction.text import CountVectorizer, TfidfVectorizer from sklearn.preprocessing import OneHotEncoder from sklearn.preprocessing import MinMaxScaler
column_trans = ColumnTransformer( [ ('CompanyName_bow', TfidfVectorizer(), 'CompanyName'), ('state_category', OneHotEncoder(), ['state']), ('Termination_Reason_Desc_bow', TfidfVectorizer(), 'Termination_Reason_Desc'), ('TermType_category', OneHotEncoder(), ['TermType']) ], remainder=MinMaxScaler() ) X = column_trans.fit_transform(X.head(100))
from sklearn.preprocessing import LabelEncoder y = LabelEncoder().fit_transform(y.head(100))
from sklearn.model_selection import train_test_split X_train, X_test, y_train, y_test = train_test_split(X,y,test_size=0.2,random_state=5)
X_train.shape #(80, 92) X_test.shape #(20, 92) y_train.shape #(80,) X_train.todense() matrix([[0. , 0. , 0. , ..., 0.26921709, 1. ,
type(X_train) --> scipy.sparse.csr.csr_matrix
print(y_train) array([0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0])
type(y_train) numpy.ndarray
use autokeras to find a model for the sonar dataset
from numpy import asarray from pandas import read_csv from sklearn.model_selection import train_test_split from sklearn.preprocessing import LabelEncoder from autokeras import StructuredDataClassifier
print(X_train.shape, X_test.shape, y_train.shape, y_test.shape)
define the search
search = StructuredDataClassifier(max_trials=15)
perform the search
search.fit(x=(X_train), y=y_train, verbose=0)
evaluate the model
loss, acc = search.evaluate(X_test, y_test, verbose=0) print('Accuracy: %.3f' % acc)
Expected Behavior
(80, 92) (20, 92) (80,) (20,) INFO:tensorflow:Reloading Oracle from existing project .\structured_data_classifier\oracle.json INFO:tensorflow:Reloading Tuner from .\structured_data_classifier\tuner0.json
TypeError Traceback (most recent call last)