Add mono- and multi-objective optimization support

Pull request overview

Closes #17
Add a new OptimJob class that can integrate GA optimization via mco (for MOO)/GPareto (for GP-based MOO) (or GA (for Mono-obj optim) with eplusr framework.

Details

Support monitor parameter like GA::ga() to show the evolution of the search:
- plot: plot information on a graphical device
- print: print the average and best fitness values at each iteration
OptimJob$pareto_front() method to extract Pareto front values
OptimJob$pareto_set() method to extract Pareto set values
OptimJob$plot() method to plot Pareto front

Proposed Interfaces

OptimJob <- R6Class("OptimJob", inherit = eplusr::ParametricJob,
    public = list(
        # create inherited objects OptimJob_NSGA2, OptimJob_GP, OptimJob_GA
        initialize = function (idf, epw, method = c("nsga2", "GPareto", "GA")),

        # same interface as `eplusr::ParametricJob$apply_measure()`
        # 1. use parameter number as a criterion to determine this is a MOO or
        #    Mono-objective optimization problem. If multiple argument, gives an
        #    error for `OptimJob_GA` class
        # 2. TODO: new better interface
        apply_measure = function (...),

        # 1. functions served as objective functions. Each function takes an
        #    `EplusJob` object as input with function output serves as
        #    one fitness value.
        #
        # 2. tabular value specifications in a data.frame format giving
        #    report_name, report_for, table_name, column_name, row_name.
        #
        # If `.prerun` is `TRUE`, run DDY simulation on the seed model (or first
        # parametric model ?) to make sure objective specification only return a
        # scalar value. If no DDY is found in current model, gives an error.
        objective = function (..., .prerun = TRUE),

        # This is just a placeholder. The actual arguments it takes depend on
        # the `method` argument input during object initialization.
        #
        # `monitor`: provide similar interface to monitor optimization process
        # like `GA::ga()`
        # 1. A logical or an R function which takes as input the current state
        #    of the OptimJob and show the evolution of the search.
        #    By default, for interactive sessions prints the average and best
        #    fitness values at each iteration.
        # 2. If set to ‘plot’ these information are plotted on a graphical
        #    device. Other functions can be written by the user and supplied as
        #    argument.
        # 3. In non interactive sessions, by default ‘monitor = FALSE’ so any
        #    output is suppressed.
        optimize = function (..., .monitor = "plot"),

        # similar interface as mco::paretoFront
        # only applicable to OptimJob_GP and OptimJob_NSGA2
        pareto_front = function (),

        # similar interface as mco::paretoSet
        # only applicable to OptimJob_GP and OptimJob_NSGA2
        pareto_set = function (),

        # similar interface as mco::paretoFilter
        # only applicable to OptimJob_GP and OptimJob_NSGA2
        pareto_filter = function (),

        # plot results
        # For OptimJob_GP and OptimJob_NSGA2, use `pareto_front` and
        # `pareto_set` to control whether to plot pareto front and pareto set
        # respectively
        plot = function (pareto_front = TRUE, pareto_set = TRUE),

        # Instead of merely printing EnergyPlus simulation statuses, provide a
        # similar interface as GA::plot.ga() to get a summary of optimization
        # results
        print = function ()
    )
)

Current implementation heavily takes advantage of the ecr (Evolutionary Computation in R) package.

Create a GAOptimJob object

path_idf <- file.path(eplusr::eplus_config(8.8)$dir, "ExampleFiles", "5Zone_Transformer.idf")

opt <- gaoptim_job(path_idf, NULL)

Define parameters and decision spaces using `GAOptimJob$apply_measure()` interface

First create a measure function

rotate_building <- function (idf, degree = c(-360, 360)) {
    cur <- idf$Building$North_Axis
    new <- cur + degree
    idf$Building$North_Axis <- new
    idf
}
roof_insul <- function (idf, thickness) {
    idf$Material$'IN46'$Thickness <- thickness
    idf
}
outsurf_conv <- function (idf, algo) {
    idf$SurfaceConvectionAlgorithm_Outside$Algorithm <- algo
    idf
}
solar_dist <- function (idf, dist) {
    idf$Building$set(Solar_Distribution = dist)
    idf
}
# combine them all
change <- function (idf, degree, dist, thickness, algo) {
    rotate_building(idf, degree)
    solar_dist(idf, dist)
    roof_insul(idf, thickness)
    outsurf_conv(idf, algo)
    idf
}

Plug-in measure function. Unlike eplusr::ParametricJob$apply_measure(), GAOGAOptimJob$apply_measure is mainly used for specifying decision dimensions, no parametric IDF will be created at this step

Note the use of helper functions float_space() and choice_space(). There is also an integer_space() function. Basically for integer_space() and choce_space() they takes a discrete vector as input, and an optional single initial value init (currently not used), while float_space() defines a continuous parameter search space giving its min, max and init.

opt$apply_measure(change,
    degree = float_space(min = -360, max = 360, init = 0),
    thickness = float_space(0.02, 0.10),
    algo = choice_space(c(
        "SimpleCombined", "TARP", "MoWiTT",
        "DOE-2", "AdaptiveConvectionAlgorithm"
    )),
    dist = choice_space(c(
            "MinimalShadowing", "FullExterior",
            "FullInteriorAndExterior", "FullExteriorWithReflections",
            "FullInteriorAndExteriorWithReflections"
    ), init = "MinimalShadowing")
)

Create objective functions

Current implementation works for both single- and multi-objective problems. The number of objectives are determined by both input function number and its return value.
Take function zone_loads below as an example. Everying objective function should have a first parameter named idf and an optional parameter param. When evaluated, the input idf will be the parametric model, and param will be actual values of degree, thickness, algo and dist (stored in a list) that are used to create it.

# define a fitness function of total zone loads
zone_loads <- function (idf, param) {
    sum(idf$last_job()$read_table("ZoneSizes")$user_des_load)
}

[DRAFT, NOT IMPLEMENTED YET] By add an other argument bare = TRUE or bare = FALSE, You can determine whether input idf in the objective shoudl be a bare eplusr::Idf object, or an Idf that has already called $run() method with the seed EPW file, e.g. zone_loads <- function (idf, bare = TRUE) {...}. If there is no bare parameter defined or if bare is FALSE, you can get access to the last simulation job (i.e. eplusr::EplusJob object) via eplusr::Idf$last_job(). Through this mechanics, you are able to access both the parametric IDF and also its simulation results.
After we have the objective functions properly defined, we can plug it in using GAOptimJob$objective(). You can specify multiple objective functions. As described above, the total objecitve number will be determined by both the functions you pass to GAOptimJob$objective() and also the actual return value numbers. E.g. if zone_loads returns a 2-length numeric vector, even though only one function is given, you are still defining a multi-objective optimization problem.

opt$objective(zone_loads)

By default, the optimization direction is set to minization to all objectives. But you can change this by setting dir to "max". If you want oppisite the optimization direction of a specific objective, just put a minus sign - before the function, e.g. opt$objective(-zone_loads).
Before running the actual optimization, GAOptimJob will first run $validate() method which checks if all required fields are filled and it will also create a sample parametric IDF and apply all specified objective functions to it. After this, the actual objective number is checked and stored internally.

Specify GA components

There are several important components that influence how GA works and each of them comes with a separated method in GAOptimJob class:

$recombinator(): Recombine/Crossover/Mate
$mutator(): Mutate
$selector(): Parent + Child selection
$terminator(): On what conditions should the GA stops

They all come along with reasonable defaults for setting up a multi-objective optimization problem using the famous NSGA-II method. That is to say, if you want to define a single-objective optimization problem, do remember to change value of parameter survival which has a default value ecr::selNomdom (Non-dominated sorting) that only works for >=2 dimentions.

opt$selector(survival = ecr::selGreedy)
opt$terminator(max_gen = 5)

Run optimization

With all things properly set up, starting to run the optimizatin is simply a call of the GAOptimJob$run() method.
You only need to specify the outout directory, where all parametric models and simulation results will be saved in a Gen[i]/Gen[i]-Ind[j] fashion. Here i and j is the generation index and individual index in that generation.
By default the parallel parameter is set to TRUE, which means that fitness evaluation is conducted in parallel, using the future.apply package. You can do it consequently by setting it to FALSE. Using future.apply framework for parallazation instead of eplusr's one is that future family packages (future.apply, future.batchtools, future.BatchJob and etc) provide a consistent via powerful API that you can easily move your calculation to not local parallelization, but also to large clusters, like Slurm, SGE, HPC and etc.

opt$run(dir = tempdir(), parallel = TRUE)

After the optimization completed, an ecr_reulst object will be returned, which is basically a named list containing different elements for differnt dimention optimizatino problems 1. Single objective: task, best.x, best.y, log, last.population, last.fitness, and message 2. Multiple objective: task, pareto.idx, pareto.front, pareto.set, last.population, and message

hongyuanjia / epluspar