Closed trainorp closed 2 years ago
Hi Patrick, at this point there is not such option, I'll add it for the next release
Greetings
Hi, @trainorp I've been researching this, and unfortunately, it looks like DEAP (the package used for all the genetic optimization) doesn't have the option to set a random seed. I added a random_state
variable in the only part that this package has control over, so it's just a partial implementation.
You can check more details on PR #97, this will be released in version 0.9.0
If this option comes available in the future for DEAP, I'll add it
@trainorp with this PR you can implement the following walk-around that seems to work: Set the random seed in each individual class that accepts this parameter, and define the seed at the top file which runs the algorithm, for example:
import numpy as np
import random
from sklearn_genetic import GASearchCV
from sklearn_genetic.space import Continuous, Categorical, Integer
from sklearn.ensemble import RandomForestClassifier
from sklearn.model_selection import train_test_split, StratifiedKFold
from sklearn.datasets import load_digits
from sklearn.metrics import accuracy_score
random_seed = 54
np.random.seed(random_seed)
random.seed(random_seed)
data = load_digits()
n_samples = len(data.images)
X = data.images.reshape((n_samples, -1))
y = data['target']
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.33, random_state=random_seed)
clf = RandomForestClassifier(random_state=random_seed)
param_grid = {'min_weight_fraction_leaf': Continuous(0.01, 0.5, distribution='log-uniform', random_state=random_seed),
'bootstrap': Categorical([True, False], random_state=random_seed),
'max_depth': Integer(2, 30, random_state=random_seed),
'max_leaf_nodes': Integer(2, 35, random_state=random_seed),
'n_estimators': Integer(100, 300, random_state=random_seed)}
cv = StratifiedKFold(n_splits=3, shuffle=True, random_state=random_seed)
evolved_estimator = GASearchCV(estimator=clf,
cv=cv,
scoring='accuracy',
population_size=8,
generations=5,
param_grid=param_grid,
n_jobs=-1,
verbose=True,
keep_top_k=4)
# Train and optimize the estimator
evolved_estimator.fit(X_train, y_train)
# Best parameters found
print(evolved_estimator.best_params_)
# Use the model fitted with the best parameters
y_predict_ga = evolved_estimator.predict(X_test)
print(accuracy_score(y_test, y_predict_ga))
# Saved metadata for further analysis
print("Stats achieved in each generation: ", evolved_estimator.history)
print("Best k solutions: ", evolved_estimator.hof)
Rodrigo,
This is so awesome. Thank you!
-Patrick
From: Rodrigo Arenas @.> Date: Sunday, June 5, 2022 at 3:49 PM To: rodrigo-arenas/Sklearn-genetic-opt @.> Cc: Patrick Trainor @.>, Mention @.> Subject: Re: [rodrigo-arenas/Sklearn-genetic-opt] Random seed (Issue #94)
@trainorphttps://github.com/trainorp with this PR you can implement the following walk-around that seems to work: Set the random seed in each individual class that accepts this parameter, and define the seed at the top file which runs the algorithm, for example:
import numpy as np
import random
from sklearn_genetic import GASearchCV
from sklearn_genetic.space import Continuous, Categorical, Integer
from sklearn.ensemble import RandomForestClassifier
from sklearn.model_selection import train_test_split, StratifiedKFold
from sklearn.datasets import load_digits
from sklearn.metrics import accuracy_score
random_seed = 54
np.random.seed(random_seed)
random.seed(random_seed)
data = load_digits()
n_samples = len(data.images)
X = data.images.reshape((n_samples, -1))
y = data['target']
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.33, random_state=random_seed)
clf = RandomForestClassifier(random_state=random_seed)
param_grid = {'min_weight_fraction_leaf': Continuous(0.01, 0.5, distribution='log-uniform', random_state=random_seed),
'bootstrap': Categorical([True, False], random_state=random_seed),
'max_depth': Integer(2, 30, random_state=random_seed),
'max_leaf_nodes': Integer(2, 35, random_state=random_seed),
'n_estimators': Integer(100, 300, random_state=random_seed)}
cv = StratifiedKFold(n_splits=3, shuffle=True, random_state=random_seed)
evolved_estimator = GASearchCV(estimator=clf,
cv=cv,
scoring='accuracy',
population_size=8,
generations=5,
param_grid=param_grid,
n_jobs=-1,
verbose=True,
keep_top_k=4)
evolved_estimator.fit(X_train, y_train)
print(evolved_estimator.bestparams)
y_predict_ga = evolved_estimator.predict(X_test)
print(accuracy_score(y_test, y_predict_ga))
print("Stats achieved in each generation: ", evolved_estimator.history)
print("Best k solutions: ", evolved_estimator.hof)
— Reply to this email directly, view it on GitHubhttps://github.com/rodrigo-arenas/Sklearn-genetic-opt/issues/94#issuecomment-1146872841, or unsubscribehttps://github.com/notifications/unsubscribe-auth/ABSVW6KWY4FY7LAOGVVJZQLVNUADJANCNFSM5XIG7INA. You are receiving this because you were mentioned.Message ID: @.***>
Hi Rodrigo,
Is there a way to make the GA search reproducible? Something like setting a random seed? Thanks!
-Patrick