lacava / few

a feature engineering wrapper for sklearn
https://lacava.github.io/few
GNU General Public License v3.0
51 stars 22 forks source link

random numbers seed not working? #37

Closed echo66 closed 6 years ago

echo66 commented 6 years ago

Greetings!

I have the following code:

feats_gen = FEW(
                ml=DecisionTreeClassifier(random_state=10, max_depth=None, min_samples_leaf=5), 
                population_size=100, tourn_size=2,                 
                mutation_rate=0.5, crossover_rate=0.5, 
                sel='epsilon_lexicase',   
                clean=True,                
                mdr=True, boolean=True, 
                random_state=10, verbosity=1, 
                scoring_function=roc_auc_score, 
                max_depth=10, min_depth=1, max_depth_init=1, 
                classification=True, 
                generations=50, max_stall=None, 
                names=list(X_train.select_dtypes(include=[np.number]).columns))

feats_gen.fit(X_train.select_dtypes(include=[np.number]).values, 
              y_train.astype(int).values)

test_ = preprocessing_pipeline.transform(e.test)

X_test = test_.X
y_test = test_[test_.target_name].astype(int)

roc_auc_score(y_test, feats_gen._best_estimator.predict_proba(feats_gen.transform(X_test.select_dtypes(include=[np.number]).values))[:, 1])

Everytime I run this code, I get different ROC AUC values in both training and test. I'm pretty sure preprocessing_pipeline is deterministic.

echo66 commented 6 years ago

I found out what the problem is: you are using numpy.random instead of a RandomState in several lines.

lacava commented 6 years ago

thanks for raising this issue. I am on vacation at the moment. if you have a fix, could you make the changes and send a PR? I would appreciate it!

lacava commented 6 years ago

looks like a couple np.random calls slipped into the code, as you mentioned. see PR #38

running docs/few_example.py seems to reproduce now.