Using scipy's DE genetic algorithm for initial parameter estimation in non-linear regression

keurfonluu / stochopy

Python library for stochastic numerical optimization

https://github.com/keurfonluu/stochopy

BSD 3-Clause "New" or "Revised" License

92 stars 16 forks source link

Using scipy's DE genetic algorithm for initial parameter estimation in non-linear regression #1

Closed zunzun closed 7 years ago

zunzun commented 7 years ago

I see you have written optimization code using Python and Differential Evolution. Scipy has solvers for non-linear equations, however scipy's default initial parameter values are all 1.0. This can be suboptimal, and in more complex equations often results in the algorithms finding a local minimum in error space. For this reason, the authors of scipy added the Differential Evolution genetic algorithm for initial parameter estimation for use in optimization. The module is named scipy.optimize.differential_evolution.

Because their implementation uses the Latin Hypercube algorithm to ensure a thorough search of parameter space, this particular scipy module requires parameter bounds.

I have used scipy's Differential Evolution genetic algorithm to determine initial parameters for fitting a double Lorentzian peak equation to Raman spectroscopy data of carbon nanotubes and found that the results were excellent. The GitHub project, with a test spectroscopy data file, is:

https://github.com/zunzun/RamanSpectroscopyFit

My background is in nuclear engineering and industrial radiation physics, and I love Python, so if you have any questions please let me know.

James Phillips

keurfonluu commented 7 years ago

Hello fellow Python's lover! Thank you for your interest for this project.

I am actually aware of the Differential Evolution implementation in SciPy, but I wanted to implement it by myself for easier customization as I need to use it on a SMP cluster.

I took a quick look at your code RamanSpectroscopyFit...quite interesting idea! I have taken the liberty to use StochOPy to fit your data simply by modifying the function generate_Initial_Parameters:

def generate_Initial_Parameters():
    # min and max used for bounds
    maxX = max(xData)
    minX = min(xData)
    maxY = max(yData)
    minY = min(yData)

    from stochopy import Evolutionary
    lower = ( -1., maxY/-2., 0., 0., minX, 0., 0., minX )
    upper = ( 1., maxY/2., maxY*100., maxY/2., maxX, maxY*100., maxY/2., maxX )
    ea = Evolutionary(sumOfSquaredError, lower, upper, popsize = 15, max_iter = 1000, clip = True, random_state = 3)
    xopt, gfit = ea.optimize(solver = "cpso")
    return xopt

I used CPSO (which is another evolutionary algorithm such as Differential Evolution) with the same hyperparameters as Scipy's Differential Evolution (population size of 15 and 1000 iterations, and it actually gives similar results! See for yourself!

My implementation of Differential Evolution is really basic, I still have to add more strategies and more options (such as dithering). Also, Latin Hypercube algorithm for population initialization is actually a good idea, thank you for the suggestion.

PS: I think that you don't have to use curve_fit after using Scipy's Differential Evolution as it already performs a gradient optimization at the end (polish option is True by default).

zunzun commented 7 years ago

Two notes.

First, I subtracted the "scipy DE" parameters from the "curve_fit()" parameters, and while some parameters are close in value some are not. For this reason, at present I will keep the call to curve_fit(). I did not know about the "polish" option, very interesting. The difference in parameter values from my test may not be significant, I have not yet investigated.

Second, in the original paper describing Differential Evolution the two authors state that the algorithm is easily parallelized. Unfortunately the first implementation of DE was done by a C programmer who wanted to conserve memory and so used a single array for the previous and next generation, changing the algorithm and making that implementation non-parallelizable. Other programmers translated this C into their own preferred language, keeping the single array in place. When DE won awards in competitions of different genetic algorithms - and it quickly did so - it used the changed algorithm. If you go back to the original white paper concept, DE itself can indeed easily be made to run in parallel.

keurfonluu commented 7 years ago

For your first point, I checked in terms of root-mean-square-error (RMSE):

after SciPy's DE with polish: RMSE = 78.165
after curve_fit with SciPy's DE solution with polish as initial guess: RMSE = 78.072

This difference can actually be explained by the fact that DE's polish option uses L-BFGS-B with numerical gradient to estimate the jacobian while curve_fit uses Levenberg-Marquadt algorithm. Therefore, you are right, it is better to use curve_fit after DE. Yet, I perform the same test without polishing, and you can actually disable the polish option:

after SciPy's DE without polishing: RMSE = 78.390
after curve_fit with SciPy's DE solution without polish as initial guess: RMSE = 78.072

For your second point, I was not aware of this change. My implementation of DE is based on my understanding of the original paper from Storn and Price (1997) and not a random code found on Internet, that's why I said it is really basic. Do you know if this change improves the convergence rate of DE, or it only improves memory optimization (while losing parallelizability)? Do you have any papers to recommend me to read about this subject?

Thank you again.

zunzun commented 7 years ago

This email thread from the scipy-dev mailing list has more detail: https://mail.python.org/pipermail/scipy-dev/2015-March/020582.html

Yes, the change improves the convergence of DE considerably, but is serial only - this is why the scipy implementation cannot be run in parallel and uses the serial implementation. The email thread discusses why serial convergence is improved due to the (unintentional) change made to the DE algorithm. Note that the original DE implementations were only tested on single-CPU computers, multi-core CPU computers were rather pricey when DE was being developed and effectively unavailable.

To get around the single-CPU serial restriction, I have seen DE implemented in a sort of pseudo-parallel mode using "islands" where the same data was run serially on different CPUs simultaneously, and then the "islands" had genetic crossover of the best individuals. Using any of the existing, tested proven codes there was little alternative to this option for parallel operation - except for re-implementing from scratch as you have done, and that meant poorer serial performance in every case, which for most researchers would have been inexplicable and assumed due to some error in their new code - an error they could never find because it did not exist.

keurfonluu commented 7 years ago

I just updated my package based on your remarks. I added a serial implementation of Differential Evolution with different mutation strategies (rand1, rand2, best1 and best2).

First, I compared the performances of my old implementation with SciPy's one and I found that SciPy's outperformed mine. Therefore I checked SciPy's source code and found that the only difference in the algorithm was how the boundary violations were handled. In my old implementation, models were constrained within the search space by clipping the parameters that violate the boundaries. In SciPy's, those parameters are simply resampled uniformly in the search space which actually improves the population's diversity. Hence, I modified my code so that it handles boundary violations in the same way as SciPy's, which allows my code to have similar performance as SciPy's.

Now that I made sure that my code is equivalent to SciPy's one, I performed several tests on several benchmark functions in high dimension (Rastrigin, Rosenbrock...) in serial and parallel. Yet, I have not been able to find a drastic difference between the serial and concurrent implementation.

In my opinion, another possible reason that Robert Kern's implementation outperformed other implementations is the way he handled the boundary violations. I actually found several papers that discuss on the impact of boundary violations handling on the performance of an evolutionary algorithm:

Anyway, thank you for your contribution that greatly improved my own knowledge on evolutionary algorithms and also my implementation of both DE and PSO.