kingfengji / gcForest

This is the official implementation for the paper 'Deep forest: Towards an alternative to deep neural networks'
http://lamda.nju.edu.cn/code_gcForest.ashx
1.31k stars 425 forks source link

Use gcForest to train my own image dataset #44

Closed marooncn closed 6 years ago

marooncn commented 6 years ago

Hi, thanks for your excellent work. I want to use gcForest to train my own image dataset but I don't find example that I can refer to. I try to write by my self as following:

using gcForest to classify normal and defect metal images.
image size: 256*256
train size: 320(160 normal + 160 defect)
test size: 80(40 normal + 40 defect)
folder tree:
 data
   --train
     --normal
     --defect
   --test
     --normal
     --decect
 main.py
 cascade.json

main.py

parent_path=os.path.dirname(os.path.realpath(__file__))
train_data_dir = parent_path + '/data/train/'
validation_data_dir = parent_path + '/data/test/'
X_train=[]
Y_train=[]
X_test=[]
Y_test=[]

for directory in os.listdir(train_data_dir):
    for file in os.listdir(train_data_dir+directory):
        print(train_data_dir+directory+"/"+file)
        img=Image.open(train_data_dir+directory+"/"+file).convert('L')
        featurevector=np.array(img).flatten() 
        X_train.append(featurevector)
        Y_train.append(directory)

for directory in os.listdir(validation_data_dir):
    for file in os.listdir(validation_data_dir+directory):
        print(validation_data_dir+directory+"/"+file)
        img=Image.open(validation_data_dir+directory+"/"+file).convert('L')
        featurevector=np.array(img).flatten() 
        X_test.append(featurevector)
        Y_test.append(directory)

config = load_json('cascade.json')
gc = GCForest(config)  # should be a dict

X_train = np.array(X_train)
Y_train = np.array(Y_train)
Y_train = Y_train.reshape(320, 1)
X_test = np.array(X_test)
Y_test = np.array(Y_test)
Y_test = Y_test.reshape(80, 1)

X_train_enc = gc.fit_transform(X_train, Y_train)
pred_X = gc.predict(X_test)
print(pred_X)
# evaluating accuracy
accuracy = accuracy_score(y_true=Y_test, y_pred=pred_X)
print('gcForest accuracy : {}'.format(accuracy))

cascade.json

{
"cascade": {
    "random_state": 0,
    "max_layers": 100,
    "early_stopping_rounds": 3,
    "n_classes": 2,
    "estimators": [
        {"n_folds":5,"type":"XGBClassifier","n_estimators":2,"max_depth":5,"objective":"multi:softprob", "silent":true, "nthread":-1, "learning_rate":0.1},
        {"n_folds":5,"type":"RandomForestClassifier","n_estimators":2,"max_depth":null,"n_jobs":-1},
        {"n_folds":5,"type":"ExtraTreesClassifier","n_estimators":2,"max_depth":null,"n_jobs":-1},
        {"n_folds":5,"type":"LogisticRegression"}
    ]
}
}

But when I run main.py, there is an error:

File "/home/maroon/tmp/gcForest/lib/gcforest/estimators/kfold_wrapper.py", line 71, in fit_transform assert len(X.shape) == len(y.shape) + 1 AssertionError

I take much time but still can't solve it. How should I do to deal with image dataset and is there more material to refer?

marooncn commented 6 years ago

The solution.