PBO (policy-based optimization) is a degenerate policy gradient algorithm used for black-box optimization. It shares common traits with both DRL (deep reinforcement learning) policy gradient methods, and ES (evolution strategies) techniques. In this repository, we present a parallel PBO algorithm with full covariance matrix adaptation, along with a few demonstrative applications. The related pre-print can be found here and the formal paper here. This paper formalizes the approach used in previous related works:
After cloning the package, just cd
inside the folder and install using:
pip install -e .
The environments from the paper are available in the envs/*
folder. For each .py
environment file, you need a .json
parameter file located in the same directory. To run an environment, just use:
pbo path/to/envs/my_env.json
Below are some selected optimization cases performed with the algorithm.
Simple minimum-finding on textbook analytical functions (see more in the paper).
We consider the minimization on a parabola defined in [-5,5]x[-5,5]
. Below is the course of a single run, generation after generation, with a starting point in [2.5,2.5]
:
The Rosenbrock function is here defined in [-2,2]x[-2,2]
. It contains a very narrow valley, with a minimum in [1,1]
. The shape of the valley makes it a hard optimization problem for many algorithms. Here is the course of a single run, generation after generation, with a starting point in [0.0,-1.0]
:
Optimal packing of geometrical shapes (see here for much, much more). The goal is to pack unit shapes in the square of smallest side. Most solutions are already known.
Environment | Description |
---|---|
6 circles in square (12 degrees of freedom): the optimal value is s = 5.328+ , best PBO result was s = 5.331 after 800 generations. |
|
10 circles in square (20 degrees of freedom): the optimal value is s = 6.747+ , best PBO result was s = 6.754 after 1000 generations |
|
3 equilateral triangles in square (9 degrees of freedom): the optimal value is s = 1.478+ , best PBO result was s = 1.478+ after 1000 generations |
|
5 equilateral triangles in square (15 degrees of freedom): the optimal value is s = 1.803+ , best PBO result was s = 1.807+ after 2500 generations |
We consider the equations of the Lorenz attractor with a velocity-based control term:
We make use of the following non-linear control with four free parameters:
Two control cases are designed: the first one consists in forcing the system to stay in the x<0 quadrant, while the second one consists in maximizing the number of sign changes (cases inspired from this thesis). Below is a comparison between the two controlled cases.