pylablanche / gcForest

Python implementation of deep forest method : gcForest
MIT License
417 stars 193 forks source link

issues about the proportion of the test_size #24

Open yancychy opened 5 years ago

yancychy commented 5 years ago

Hi, When I use the gcForest, I tried different test_size as 0.3, 0.2 and 0.1. But when I set test_size=0.1, the gcForest() produced errors. I find the cascade_test_size is default=0.2. Is this the problem? Thanks.

Code

X_tr, X_te, y_tr, y_te = train_test_split(sX, sY, test_size=0.2)
gcf = gcForest(shape_1X= [1,X_tr.shape[1]],   window=50, tolerance=0.0)
gcf.fit(X_tr, y_tr)

Errors:

/Users/cheny/anaconda3/envs/py36/lib/python3.6/site-packages/sklearn/ensemble/forest.py:458: UserWarning: Some inputs do not have OOB scores. This probably means too few trees were used to compute any reliable oob estimates. warn("Some inputs do not have OOB scores. " /Users/cheny/anaconda3/envs/py36/lib/python3.6/site-packages/sklearn/ensemble/forest.py:463: RuntimeWarning: divide by zero encountered in true_divide predictions[k].sum(axis=1)[:, np.newaxis]) /Users/cheny/anaconda3/envs/py36/lib/python3.6/site-packages/sklearn/ensemble/forest.py:463: RuntimeWarning: invalid value encountered in true_divide predictions[k].sum(axis=1)[:, np.newaxis]) Adding/Training Layer, n_layer=1 ValueError Traceback (most recent call last)

in 1 gcf = gcForest(shape_1X= [1,X_tr.shape[1]], window=50, tolerance=0.0) ----> 2 gcf.fit(X_tr, y_tr) ~/Documents/tools/deepL/gcForest-master/GCForest.py in fit(self, X, y) 124 125 mgs_X = self.mg_scanning(X, y) --> 126 _ = self.cascade_forest(mgs_X, y) 127 128 def predict_proba(self, X): ~/Documents/tools/deepL/gcForest-master/GCForest.py in cascade_forest(self, X, y) 345 346 self.n_layer += 1 --> 347 prf_crf_pred_ref = self._cascade_layer(X_train, y_train) 348 accuracy_ref = self._cascade_evaluation(X_test, y_test) 349 feat_arr = self._create_feat_arr(X_train, prf_crf_pred_ref) ~/Documents/tools/deepL/gcForest-master/GCForest.py in _cascade_layer(self, X, y, layer) 409 print('Adding/Training Layer, n_layer={}'.format(self.n_layer)) 410 for irf in range(n_cascadeRF): --> 411 prf.fit(X, y) 412 crf.fit(X, y) 413 setattr(self, '_casprf{}_{}'.format(self.n_layer, irf), prf) ~/anaconda3/envs/py36/lib/python3.6/site-packages/sklearn/ensemble/forest.py in fit(self, X, y, sample_weight) 248 249 # Validate or convert input data --> 250 X = check_array(X, accept_sparse="csc", dtype=DTYPE) 251 y = check_array(y, accept_sparse='csc', ensure_2d=False, dtype=None) 252 if sample_weight is not None: ~/anaconda3/envs/py36/lib/python3.6/site-packages/sklearn/utils/validation.py in check_array(array, accept_sparse, accept_large_sparse, dtype, order, copy, force_all_finite, ensure_2d, allow_nd, ensure_min_samples, ensure_min_features, warn_on_dtype, estimator) 571 if force_all_finite: 572 _assert_all_finite(array, --> 573 allow_nan=force_all_finite == 'allow-nan') 574 575 shape_repr = _shape_repr(array.shape) ~/anaconda3/envs/py36/lib/python3.6/site-packages/sklearn/utils/validation.py in _assert_all_finite(X, allow_nan) 54 not allow_nan and not np.isfinite(X).all()): 55 type_err = 'infinity' if allow_nan else 'NaN, infinity' ---> 56 raise ValueError(msg_err.format(type_err, X.dtype)) 57 58 ValueError: Input contains NaN, infinity or a value too large for dtype('float32').
samxrl commented 5 years ago

I've had the same problem, but when I set n_mgsRFtree>40 The err no longer occurs,maybe you should set a higher n_mgsRFtree