How to deal with integer or discrete parameters?

Doradx commented 3 years ago

I have seen the Mealpy repository on GitHub, and have used the algorithms for hyper-parameter optimization. It's a very nice job!
But when optimizing integer or discrete parameters, how to define the lb and ub？

such as:

parameters_dist = {
    'c': np.linspace(0, 10, 11),
    'g': np.linspace(0, 5, 6)
}

How should I pass them to the algorithm?

Best regards, Dorad, cug.xia@gmail.com

thieu1995 commented 3 years ago

Hi @Doradx ,

First of all, this library is for solving continuous domain problem. But you can do a trick to solve discrete problem.

So based on your message, I can guide you two things.

How can optimize hyper-parameter. For example, PSO algorithm has 4 hyper-parameters include c1, c2, w_min and w_max.

c1 can be [0.5, 1.0, 1.5, 2.0]      # local coefficient
c2 can be [0.5, 1.0, 1.5, 2.0]      # global coefficient 
w_min can be [0.2, 0.4]
w_max can be [0.8, 0.9, 1.0]        # weight of the bird, decrease linearly from w_max to w_min


from sklearn.model_selection import ParameterGrid
from mealpy.swarm_based.PSO import BasePSO
from numpy import sum 

def objective_func(solution):
    return sum(solution**2)

verbose = True
epoch = 100
pop_size = 50
lb = [-3, -5, 1]
ub = [5, 10, 100]

list_coefs = {
    "c1": [0.5, 1.0, 1.5, 2.0],
    "c2": [0.5, 1.0, 1.5, 2.0],
    "w_min": [0.2, 0.4],
    "w_max": [0.8, 0.9, 1.0]
}
for item in list(ParameterGrid(list_coefs)):
    model = BasePSO(objective_func, lb, ub, verbose, epoch, pop_size, 
        c1=item["c1"], c2=item["c2"], w_min=item["w_min"], w_max=item["w_max"])
    best_position, best_fitness, list_loss = model.train()
    print(model.solution[0])
    print(model.solution[1])
    print(model.loss_train)
    # Save your results in Excel or CSV for later comparison

How to optimize discrete variables. (I guess this is what you are asking for) The key is how to develop an objective function. Let's solve your problem. You want to optimize two variables "c" and "g". They can be discrete value or categorical labels. If they are categorical values, you can use LabelEncoder to encode them into an integer value. I added 3 more variables to demonstrate the diversity of how to crack it down.

parameters_dist = {
    'c': np.linspace(0, 10, 11),                    # integer start from 0, with step = 1                   
    'g': np.linspace(0, 5, 6),                      # integer start from 0, with step = 1
    "opt": ["adam", "SGD", "adagrad", "RMSprop"],   # categorical variable 
    "batch_size": [32, 64, 128, 256],               # integer start not from 0, with step = 2*x 
    "rd": [0.1, 1.3, 2.0, 5.0],                     # integer start not from 0, with a random step  
}

=> From here we see that: c value can be [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10] g value can be [0, 1, 2, 3, 4, 5] opt value can be ["adam", "SGD", "adagrad", "RMSprop"], batch_size value can be [32, 64, 128, 256],
rd value can be [0.1, 1.3, 2.0, 5.0]

So the trick here is to encode all discrete and categorical variables into the continuous variable. Then decode it back to the original value. We can do it in the objective function.


from sklearn.preprocessing import LabelEncoder

# Handle categorical variable first 
OPT_ENCODER = LabelEncoder()
OPT_ENCODER.fit(['adam', 'SGD', 'adagrad', 'RMSprop'])

# Next, write a function to handle the random step variable.
def decode_rd(rd_value):
    if rd_value == 0:
        return 0.1
    elif rd_value == 1:
        return 1.3
    elif rd_value == 2:
        return 2.0 
    elif rd_value == 3:
        return 5.0
    else:
        exit()
    ## From here you can see the lower value for rd variable will be 0, and upper value will be 3.99
    ## Why 3.99, I will explain later.

# Define an objective function 
def objective_function(solution):
    # c = [0, 1,  2, 3, 4, 5, 6, 7, 8, 9, 10]
    # g = [0, 1, 2, 3, 4, 5]
    # opt = ["adam", "SGD", "adagrad", "RMSprop"]
    # batch_size = [32, 64, 128, 256, 512]
    # rd = [750, 1000, 1250, 1500]

    solution = solution.astype(int)     
    ## Decode continous variable to integer/categorical variable to calculate objective function.
    ## So you see, rd_value has 4 different values. Since we use the if condition start with 0.
    ## Therefore, the maximum value for rd_value will be 3.99 because we use .astype(int) function above.
    ## Imagine:
    ##      1st value of rd_value (0.1) will has range: 0 to 0.99
    ##      2nd value of rd_value (1.3) will has range: 1 to 1.99
    ##      3rd value of rd_value (2.0) will has range: 2 to 2.99
    ##      4th value of rd_value (5.0) will has range: 3 to 3.99
    ## This encode-decode mechanism will ensure the balance range among different cases for each variable.

    c = solution[0]                 # lb = 0, ub = 10.99
    g = solution[1]                 # lb = 0, ub = 5.99
    opt = solution[2]               # lb = 0, ub = 4.99
    batch_size = 2 ** solution[3]   # lb = 5, ub = 9.99,    because 2^5 = 32, 2^6 = 54,...
    rd = solution[4]                # lb = 0, ub = 3.99 

    # Time to decode the solution to actual value 
    # c, g, batch_size: we already decoded above 
    optimizer = OPT_ENCODER.inverse_transform([opt])[0]     # This will return the string such as "adam", or "SGD",..
    rd = decode_rd(rd)      # Using decode function to get the real value of rd_variable 

    ## Calculate your objective value based on above real variable. 
    obj_value = ....

    return obj_value

# So now we define lower/upper bound and algorithm 

LB = [0, 0, 0, 5, 0]
UB = [10.99, 5.99, 4.99, 9.99, 3.99]
MAX_GEN = 100
POP_SIZE = 50

list_coefs = {
    "c1": [0.5, 1.0, 1.5, 2.0],
    "c2": [0.5, 1.0, 1.5, 2.0],
    "w_min": [0.2, 0.4],
    "w_max": [0.8, 0.9, 1.0]
}
for item in list(ParameterGrid(list_coefs)):
    model = BasePSO(objective_function, LB, UB, verbose, MAX_GEN, POP_SIZE, 
        c1=item["c1"], c2=item["c2"], w_min=item["w_min"], w_max=item["w_max"])
    best_position, best_fitness, list_loss = model.train()
    print(model.solution[0])
    print(model.solution[1])
    print(model.loss_train)
    # Save your results in Excel or CSV for later comparison

Hope it is helpful for you.

Doradx commented 3 years ago

Thanks for your detailed reply. But my problem is in the target function. Such as: I want to use BasePSO to optimize the c and g for sklearn.svm.SVR, which is used in the target function. Then we should define the lb and ub for c and g, which is used for SVR, not BasePSO. How should I deal with the lb and ub ?

thieu1995 commented 3 years ago

Hi @Doradx

You should first learn about SVR, learn about metaheuristic algorithms. Then learn about the difference between optimizing hyper-parameters and optimizing parameters. You are currently don't know what you are trying to do. If you ask me how to deal with Lower bound and Upper bound of "c" and "g" for SVR, then ask the sklearn team, why ask me. Here is the completed example for how to optimize hyper-parameters of SVR model.

import numpy as np
import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.svm import SVR
from sklearn.metrics import r2_score,mean_squared_error
from sklearn.preprocessing import LabelEncoder

# Handle categorical variable first 
G_ENCODER = LabelEncoder()
G_ENCODER.fit([‘scale’, ‘auto’])

# Load the Data
df = pd.read_csv(‘Support-Vector-Regression-Data.csv’)
x = df.x.values.reshape(-1, 1)
y = df.y.values.reshape(-1, 1)
x_train, x_test, y_train, y_test = train_test_split(x, y, test_size=0.4, random_state=42)

# Define an objective function 
def objective_function(solution):
    # c = [0.25, 0.5, 0.75, 1.0, 1.25, 1.5, 1.75, 2.0]
    # g = [‘scale’, ‘auto’]

    solution = solution.astype(int)     

    c_temp = solution[0]        # lb = 1, ub = 8.99
    g_temp = solution[1]        # lb = 0, ub = 1.99

    # Time to decode the solution to actual value 
    c_optimized = 0.25 * c_temp     # because c_temp value can only be [1, 2, 3, 4, 5, 6, 7, 8]
    g_optimized = G_ENCODER.inverse_transform([g_temp])[0]

    ## Calculate your objective value based on above real variable. 

    svr_model = SVR(c = c_optimized, gamma = g_optimized)           # Pass your optimized paras from above solution 
    svr_model.fit(x_train, y_train)

    y_pred = svr_model.predict(x_test)
    mse =  mean_squared_error(y_test, y_pred)
    objective_value = rmse = np.sqrt(mse)

    return objective_value

# So now we define lower/upper bound and algorithm 

LB = [1, 0]         # [lowerbound for c, lowerbound for g]
UB = [8.99, 1.99]   # [upperbound for c, upperbound for g]
MAX_GEN = 100
POP_SIZE = 50

model = BasePSO(objective_function, LB, UB, verbose, MAX_GEN, POP_SIZE)
best_position, best_fitness, list_loss = model.train()
print(model.solution[0])
# This will print out the best value (optimized value) of c and g. Just need a decode function 
c_optimized = 0.25 * solution[0][0]
g_optimized = G_ENCODER.inverse_transform([ solution[0][1] ])[0]
print(f"Best c = {c_optimized}, g = {g_optimized}")

Best regards

Doradx commented 3 years ago

Thanks @thieu1995 I have used a similar way to treat this problem before, but the results are not good. Thanks for your reply.

thieu1995 commented 3 years ago

Hi @Doradx Then why do you want to use metaheuristics to solve this problem? You can just use brute force and search for the best hyperparameters.

import numpy as np
import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.svm import SVR
from sklearn.metrics import r2_score,mean_squared_error

# Load the Data
df = pd.read_csv(‘Support-Vector-Regression-Data.csv’)
x = df.x.values.reshape(-1, 1)
y = df.y.values.reshape(-1, 1)
x_train, x_test, y_train, y_test = train_test_split(x, y, test_size=0.4, random_state=42)

list_coefficients = {
    "c": [0.25, 0.5, 0.75, 1.0, 1.25, 1.5, 1.75, 2.0],
    "g": [‘scale’, ‘auto’]
}
for item in list(ParameterGrid(list_coefficients)):
    svr_model = SVR(c = item["c"], g = item["g"])           
    svr_model.fit(x_train, y_train)
    y_pred = svr_model.predict(x_test)
    mse =  mean_squared_error(y_test, y_pred)
    rmse = np.sqrt(mse)
    item["rmse"] = rmse

    # Save item to csv file 

# Open csv, sort them based on RMSE column
# Get your best "c" and "g" value.

The problem with metaheuristic algorithms is they are approximation models. You can never know the outcome is global best or not. So the results may not good for some cases.

Doradx commented 3 years ago

@thieu1995 In fact, sklearn has provide GridSearchCV and RandomizedSearchCV to solve this problem. But also, many researchers have used PSO、GWO, and ABC to do this job. This library has provided a lot of metaheuristic algorithms, so I want to compare their performance in hyperparameter optimization. Thanks for your kindly reply!

thieu1995 commented 3 years ago

@Doradx FYI

In fact, the brute force is GridSearchCV. The results of GridSearchCV will always be better than RandomizedSearchCV or Metaheuristics. Simply because you trials all combination possibility. You can only compare the results of RandomizedSearchCV and Metaheuristics.

RandomizedSearch uses specific distribution to select parameters meanwhile Metaheuristics use trick/operators to select parameters. Both of them are approximation models.

Doradx commented 3 years ago

@thieu1995 thanks for your reply. I have dealt with it.

thieu1995 / mealpy

How to deal with integer or discrete parameters? #23