Closed Doradx closed 3 years ago
Hi @Doradx ,
First of all, this library is for solving continuous domain problem. But you can do a trick to solve discrete problem.
So based on your message, I can guide you two things.
c1 can be [0.5, 1.0, 1.5, 2.0] # local coefficient
c2 can be [0.5, 1.0, 1.5, 2.0] # global coefficient
w_min can be [0.2, 0.4]
w_max can be [0.8, 0.9, 1.0] # weight of the bird, decrease linearly from w_max to w_min
from sklearn.model_selection import ParameterGrid
from mealpy.swarm_based.PSO import BasePSO
from numpy import sum
def objective_func(solution):
return sum(solution**2)
verbose = True
epoch = 100
pop_size = 50
lb = [-3, -5, 1]
ub = [5, 10, 100]
list_coefs = {
"c1": [0.5, 1.0, 1.5, 2.0],
"c2": [0.5, 1.0, 1.5, 2.0],
"w_min": [0.2, 0.4],
"w_max": [0.8, 0.9, 1.0]
}
for item in list(ParameterGrid(list_coefs)):
model = BasePSO(objective_func, lb, ub, verbose, epoch, pop_size,
c1=item["c1"], c2=item["c2"], w_min=item["w_min"], w_max=item["w_max"])
best_position, best_fitness, list_loss = model.train()
print(model.solution[0])
print(model.solution[1])
print(model.loss_train)
# Save your results in Excel or CSV for later comparison
parameters_dist = {
'c': np.linspace(0, 10, 11), # integer start from 0, with step = 1
'g': np.linspace(0, 5, 6), # integer start from 0, with step = 1
"opt": ["adam", "SGD", "adagrad", "RMSprop"], # categorical variable
"batch_size": [32, 64, 128, 256], # integer start not from 0, with step = 2*x
"rd": [0.1, 1.3, 2.0, 5.0], # integer start not from 0, with a random step
}
=> From here we see that:
c value can be [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10]
g value can be [0, 1, 2, 3, 4, 5]
opt value can be ["adam", "SGD", "adagrad", "RMSprop"],
batch_size value can be [32, 64, 128, 256],
rd value can be [0.1, 1.3, 2.0, 5.0]
So the trick here is to encode all discrete and categorical variables into the continuous variable. Then decode it back to the original value. We can do it in the objective function.
from sklearn.preprocessing import LabelEncoder
# Handle categorical variable first
OPT_ENCODER = LabelEncoder()
OPT_ENCODER.fit(['adam', 'SGD', 'adagrad', 'RMSprop'])
# Next, write a function to handle the random step variable.
def decode_rd(rd_value):
if rd_value == 0:
return 0.1
elif rd_value == 1:
return 1.3
elif rd_value == 2:
return 2.0
elif rd_value == 3:
return 5.0
else:
exit()
## From here you can see the lower value for rd variable will be 0, and upper value will be 3.99
## Why 3.99, I will explain later.
# Define an objective function
def objective_function(solution):
# c = [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10]
# g = [0, 1, 2, 3, 4, 5]
# opt = ["adam", "SGD", "adagrad", "RMSprop"]
# batch_size = [32, 64, 128, 256, 512]
# rd = [750, 1000, 1250, 1500]
solution = solution.astype(int)
## Decode continous variable to integer/categorical variable to calculate objective function.
## So you see, rd_value has 4 different values. Since we use the if condition start with 0.
## Therefore, the maximum value for rd_value will be 3.99 because we use .astype(int) function above.
## Imagine:
## 1st value of rd_value (0.1) will has range: 0 to 0.99
## 2nd value of rd_value (1.3) will has range: 1 to 1.99
## 3rd value of rd_value (2.0) will has range: 2 to 2.99
## 4th value of rd_value (5.0) will has range: 3 to 3.99
## This encode-decode mechanism will ensure the balance range among different cases for each variable.
c = solution[0] # lb = 0, ub = 10.99
g = solution[1] # lb = 0, ub = 5.99
opt = solution[2] # lb = 0, ub = 4.99
batch_size = 2 ** solution[3] # lb = 5, ub = 9.99, because 2^5 = 32, 2^6 = 54,...
rd = solution[4] # lb = 0, ub = 3.99
# Time to decode the solution to actual value
# c, g, batch_size: we already decoded above
optimizer = OPT_ENCODER.inverse_transform([opt])[0] # This will return the string such as "adam", or "SGD",..
rd = decode_rd(rd) # Using decode function to get the real value of rd_variable
## Calculate your objective value based on above real variable.
obj_value = ....
return obj_value
# So now we define lower/upper bound and algorithm
LB = [0, 0, 0, 5, 0]
UB = [10.99, 5.99, 4.99, 9.99, 3.99]
MAX_GEN = 100
POP_SIZE = 50
list_coefs = {
"c1": [0.5, 1.0, 1.5, 2.0],
"c2": [0.5, 1.0, 1.5, 2.0],
"w_min": [0.2, 0.4],
"w_max": [0.8, 0.9, 1.0]
}
for item in list(ParameterGrid(list_coefs)):
model = BasePSO(objective_function, LB, UB, verbose, MAX_GEN, POP_SIZE,
c1=item["c1"], c2=item["c2"], w_min=item["w_min"], w_max=item["w_max"])
best_position, best_fitness, list_loss = model.train()
print(model.solution[0])
print(model.solution[1])
print(model.loss_train)
# Save your results in Excel or CSV for later comparison
Hope it is helpful for you.
Thanks for your detailed reply. But my problem is in the target function. Such as: I want to use BasePSO to optimize the c and g for sklearn.svm.SVR, which is used in the target function. Then we should define the lb and ub for c and g, which is used for SVR, not BasePSO. How should I deal with the lb and ub ?
Hi @Doradx
You should first learn about SVR, learn about metaheuristic algorithms. Then learn about the difference between optimizing hyper-parameters and optimizing parameters. You are currently don't know what you are trying to do. If you ask me how to deal with Lower bound and Upper bound of "c" and "g" for SVR, then ask the sklearn team, why ask me. Here is the completed example for how to optimize hyper-parameters of SVR model.
import numpy as np
import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.svm import SVR
from sklearn.metrics import r2_score,mean_squared_error
from sklearn.preprocessing import LabelEncoder
# Handle categorical variable first
G_ENCODER = LabelEncoder()
G_ENCODER.fit([‘scale’, ‘auto’])
# Load the Data
df = pd.read_csv(‘Support-Vector-Regression-Data.csv’)
x = df.x.values.reshape(-1, 1)
y = df.y.values.reshape(-1, 1)
x_train, x_test, y_train, y_test = train_test_split(x, y, test_size=0.4, random_state=42)
# Define an objective function
def objective_function(solution):
# c = [0.25, 0.5, 0.75, 1.0, 1.25, 1.5, 1.75, 2.0]
# g = [‘scale’, ‘auto’]
solution = solution.astype(int)
c_temp = solution[0] # lb = 1, ub = 8.99
g_temp = solution[1] # lb = 0, ub = 1.99
# Time to decode the solution to actual value
c_optimized = 0.25 * c_temp # because c_temp value can only be [1, 2, 3, 4, 5, 6, 7, 8]
g_optimized = G_ENCODER.inverse_transform([g_temp])[0]
## Calculate your objective value based on above real variable.
svr_model = SVR(c = c_optimized, gamma = g_optimized) # Pass your optimized paras from above solution
svr_model.fit(x_train, y_train)
y_pred = svr_model.predict(x_test)
mse = mean_squared_error(y_test, y_pred)
objective_value = rmse = np.sqrt(mse)
return objective_value
# So now we define lower/upper bound and algorithm
LB = [1, 0] # [lowerbound for c, lowerbound for g]
UB = [8.99, 1.99] # [upperbound for c, upperbound for g]
MAX_GEN = 100
POP_SIZE = 50
model = BasePSO(objective_function, LB, UB, verbose, MAX_GEN, POP_SIZE)
best_position, best_fitness, list_loss = model.train()
print(model.solution[0])
# This will print out the best value (optimized value) of c and g. Just need a decode function
c_optimized = 0.25 * solution[0][0]
g_optimized = G_ENCODER.inverse_transform([ solution[0][1] ])[0]
print(f"Best c = {c_optimized}, g = {g_optimized}")
Best regards
Thanks @thieu1995 I have used a similar way to treat this problem before, but the results are not good. Thanks for your reply.
Hi @Doradx Then why do you want to use metaheuristics to solve this problem? You can just use brute force and search for the best hyperparameters.
import numpy as np
import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.svm import SVR
from sklearn.metrics import r2_score,mean_squared_error
# Load the Data
df = pd.read_csv(‘Support-Vector-Regression-Data.csv’)
x = df.x.values.reshape(-1, 1)
y = df.y.values.reshape(-1, 1)
x_train, x_test, y_train, y_test = train_test_split(x, y, test_size=0.4, random_state=42)
list_coefficients = {
"c": [0.25, 0.5, 0.75, 1.0, 1.25, 1.5, 1.75, 2.0],
"g": [‘scale’, ‘auto’]
}
for item in list(ParameterGrid(list_coefficients)):
svr_model = SVR(c = item["c"], g = item["g"])
svr_model.fit(x_train, y_train)
y_pred = svr_model.predict(x_test)
mse = mean_squared_error(y_test, y_pred)
rmse = np.sqrt(mse)
item["rmse"] = rmse
# Save item to csv file
# Open csv, sort them based on RMSE column
# Get your best "c" and "g" value.
The problem with metaheuristic algorithms is they are approximation models. You can never know the outcome is global best or not. So the results may not good for some cases.
@thieu1995 In fact, sklearn has provide GridSearchCV and RandomizedSearchCV to solve this problem. But also, many researchers have used PSO、GWO, and ABC to do this job. This library has provided a lot of metaheuristic algorithms, so I want to compare their performance in hyperparameter optimization. Thanks for your kindly reply!
@Doradx FYI
In fact, the brute force is GridSearchCV. The results of GridSearchCV will always be better than RandomizedSearchCV or Metaheuristics. Simply because you trials all combination possibility. You can only compare the results of RandomizedSearchCV and Metaheuristics.
RandomizedSearch uses specific distribution to select parameters meanwhile Metaheuristics use trick/operators to select parameters. Both of them are approximation models.
@thieu1995 thanks for your reply. I have dealt with it.
I have seen the Mealpy repository on GitHub, and have used the algorithms for hyper-parameter optimization. It's a very nice job!
But when optimizing integer or discrete parameters, how to define the lb and ub?
such as:
How should I pass them to the algorithm?
Best regards, Dorad, cug.xia@gmail.com