Improve discussion on multi-threading in "Optimisation with Pygmo"

In the Optimization with PyGMO section of the docs, I think it would help to add a small discussion about how to properly multi-thread with Python and Pygmo.

I find multi-threading really useful when dealing with astrodynamics optimisation problems since multiple tudat propagations can then be run in batch, allowing for batch fitness evaluations, instead of running all simulations one after another.

Here are a few elements that I have tested in Linux, and that allow for multi-threading in Pygmo without encountering any issues with neither Pygmo, nor tudat, nor Spice:

The multiprocessing package of Python is best, and usingmultiprocessing.get_context("spawn") is needed (this is to create a new memory space for each parallelized thread, making memory management easier for tudat and SPICE since processes do not "see" each other's existence).
For coding convenience, a file should be created (let's call it run_opti.py), in which a function is created that takes design variables as input, setup and run the simulation, and outputs fitness values (let's call that function get_fitnesses()).
Any code that is in the run_opti.py file that is not a function or an import statement should be under a if __name__ == "__main__" condition. Otherwise, the multiprocessing module will get stuck in an infinite loop.
A second file should be created (let's call it opti_framework.py), in which a Pygmo user-defined problem class is to be created. This class should contain the usual Pygmo UDP functions described in this guide.
- In this class, the fitness() function needs to be present (even if it can stay empty because it won't be used; Pygmo still checks if it exists).
- Also, this class must contain a batch_fitness() function, that takes a flattened (1D) list of design variables, and then should output a flattened list of fitnesses.
- The batch_fitness() function can then extract the individual design variables (convert the flattened 1D list to a 2D list), and then run the simulations in parallel by using the following two lines (to use a maximum of 48 CPU cores, with the 2D list of design variables inputs, and the get_fitnesses function from run_opti.py:
```
with multiprocessing.get_context("spawn").Pool(processes=48) as pool:
outputs = pool.starmap(get_fitnesses, inputs)
```
- The outputs then contains 2D list of fitnesses (since multi-objective optimisation is assumed), and it should be flattened into a 1D list before returning the fitnesses.
In run_opti.py, the optimisation problem is to be setup, and then the population initialised and evolved (this will then call the UDP class from opti_framework.py). To make the problem use multi-threading (and the batch_fitness function), the following must be done. Let's say the NSGA2 optimiser is used (algo = pg.nsga2()), it can be set to use batch fitness evaluation by adding the two following lines in the script: algo.set_bfe(pg.bfe()) then algo = pg.algorithm(algo).

An example of all of this implemented can be found in my repo, with my own run_opti.py (I called run_problem.py), and my own opti_framework.py (I called drag_comp_problem.py) files.

Just to add to this: parallelizing simulation outside of the Pygmo framework is quite straightforward. I pieced together the essentials here below (wich works with tudatPy and SPICE):

import multiprocessing as MP
import numpy as np

from tudatpy.kernel.numerical_simulation import environment_setup, propagation_setup
from tudatpy.kernel.interface import spice # This setup works with SPICE :)

def run_simulation(arg_1, arg_2):
    # Do some tudat things...
    # Even better, save results directly into a database using sqlite3 to avoid 
    return 1, arg_1 + arg_2

if __name__ == "__main__":
    # Number of simulations to run
    N = 500
    arg_1_list = np.random.normal(-100, 50, size=N)
    arg_2_list = np.random.normal(1e6, 2e5, size=N)

    # Combine list of inputs
    inputs = []
    for i in range(N):
        inputs.append((arg_1_list[i], arg_2_list[i]))

    # Run simulations in parallel, using half the available cores
    n_cores = MP.cpu_count()//2
    with MP.get_context("spawn").Pool(n_cores) as pool:
        outputs = pool.starmap(run_simulation, inputs)
    # Outputs is a list of tuples (just like inputs); each tuple contains the output of a simulation

    # Note: the memory will be freed only after all the outputs are collected.
    # It is then wise to split the list of inputs into smaller batches in case
    # a high number of simulations are run, to avoid overflowing the memory.

tudat-team / tudat-space

Improve discussion on multi-threading in "Optimisation with Pygmo" #73