thouska / spotpy

A Statistical Parameter Optimization Tool
https://spotpy.readthedocs.io/en/latest/
MIT License
248 stars 150 forks source link

[WIP] added functionality to test current implementation of nsgaii algorithm and new version of nsgaii #246

Closed iacopoff closed 3 years ago

iacopoff commented 4 years ago

Hi @thouska, I have added the the option to skip the simulation of parent population after it is combined to offspring population, I am wondering if this should actually be the default. It does have a lot of gain in performance when the simulation model is heavy (such as VIC). Regarding the burn-in phase, I am not sure I understand what you mean about that (it is not in the paper by the way), do you think should be included in nsga-ii?

thouska commented 4 years ago

Hi @iacopoff, ok great the nsgaii.py looks good to me. I added a small example to the tutorial_nsgaii.py. I think for now, we can skip the burn-in phase, the closer we can get to the published version the better. However, the skip of parameter duplicates sounds like a solid idea to me, if you want, we can make that the default. If so, it needs a comment and the option to deactivate. Where I was wandering: The objective functions that are currently used are Agreement Index, RMSE and Coeff of Variation. Is the algorithm minimizing (would work for RMSE) or maximing (would work for the others)? After that: Next step include the algorithm with a comment that it is in beta or alpha status? Or should we work on anyting else before that?

iacopoff commented 4 years ago

@thouska, I may be wrong but it seems that in the tutorial_nsgaii.py you are still using the "old" version of nsgaii? I was developing the new one under nsgaii_dev (my bad if it was not clear!). The algorithm is minimising, there is a script, tutorial_nsgai_dev.py, where it is used with mean absolute error to optimise the dtlz1 problem. Regarding the burn-in phase, if you like we can talk about it when I am back from holiday in September? I would release as alpha if you agree. I have a couple of days more to work on it before I go in holiday, I can do any change or comment before that time. thanks

thouska commented 4 years ago

Hi @iacopoff, oh yes thats true. I was getting an error with the developement version. But maybe I should report it, instead of switching to the old version :) I changed the tutorial nsgaii.py so that it is using the nsgaii_dev.py. Using your example with the dltz1 optimization problem works fine. However, if I switch to the hymod example, I get the following error:

File "spotpy\algorithms\nsgaii_dev.py", line 271, in sample Of = np.vstack([i[2] for i in Of]).reshape(self.n_pop,self.n_obj)

ValueError: cannot reshape array of size 43830 into shape (30,3)

Do you have any idea, why this is happening?

iacopoff commented 4 years ago

Hi @thouska, Of stands for "Objective functions" and it should be a list of the results from the self.repeat:

param_generator = ((i,Pt[i,:]) for i in range(self.n_pop)) Of = list(self.repeat(param_generator))

each element of the list should contain:

(index, parameters, obj-functions)

and that's why I am using index [2] to select the objective functions:

Of = np.vstack([i[2] for i in Of]).reshape(self.n_pop,self.n_obj)

The issue is that in dtlz1 it works fine but with the hymod example instead of the objective function it seems to return the time series in index [2], and it breaks the code.

What do you think about that?

iacopoff commented 4 years ago

A quick update: for some reason the objectivefunction is not called by the algorithm in the hymod version. I will investigate later, however you may already know what is possibly causing that?

thouska commented 4 years ago

Hi @iacopoff, thanks for the details, indeed i have a feeling where the error is: This stuff here

    param_generator = ((i,Pt[i,:]) for i in range(self.n_pop))
    Of = list(self.repeat(param_generator))

is returning (index, parameters and simulation_results) and not the objective function. In the optimization approach the dltz1 problem is basically the objective function too, which may have resulted in this confusion (also because I am using in the rosenbrock/ackley/griewank optimization problems an objective function (which are basically telling, how far the results are from the optimal point). What we need instead is the call of postprocessing. This takes the simulation_results and the oberved_data and puts them into the "def objectivefunction". So, the solution is probably something like this:

    param_generator = ((i,Pt[i,:]) for i in range(self.n_pop))
    index, parameters and simulation_results = list(self.repeat(param_generator))
    Of = self.postprocessing(self.iter, curent_parameterset, simulation_results, chains=igen) 
iacopoff commented 4 years ago

@thouska, ok I got that thanks!

I think I need to refactor a bit the code though.

I have another question:

this piece of code in _algorithm.py under the postprocessing function (in particular line 301)

if type(like)==type([]): return like[0] else: return like

is basically saying that if the like is a list, then get the first item. This cause the self.postprocessing to return 1 obj function instead of 3.

I was wondering how do you handle that in the other multi-obj algorithm?

thanks for your help!

thouska commented 4 years ago

Oh good point. I think I have to do some refactoring of the code :) The idea was that spotpy is able to return several objective function even if a only the first one is used for optimization. I think I would like to keep this behavior. However, you are completely right, this part is not woring in a multi-objective calibration. I think the easiest way would be that the multi-objective optimization algorithms (nsgaii and padds) are telling _algorithm.py that they need the full like list back. I can implement that tomorrow.

iacopoff commented 4 years ago

great thanks!

Also, in the mutation function of the nsagii algorithm, I am using min and max bounds of the parameters and currently I am using self.setup.params to get those whilst in the tutorial the parameters are defined at the start of the spot_setup class.

Do you think I could add a def parameters in the setup_hymod_python.py or add a try/except in the nsgaii algorithm to handle this case?

thouska commented 4 years ago

Ok, all objectivefunction values should now be returned, if you call self.postprocessing(..) in the nsgaii algorithm.

Regarding the min/max boundaries of the parameters: There are actually several values supported, how the paramters are defined in the spot_setup class. As long as you call the final self.parameter object of spotpy, you can be sure, that it does not matter how/where the parameters are defined in the spot_setup class, the min/max setting will be available somehow. I changed this part in nsgaii too. Hope that covers your point?

iacopoff commented 4 years ago

Hi @thouska, thanks for the changes. It looks it is working fine now, I think I need to check further the skip_duplicate in terms of performance.

Please try to run the tutorial and if you want you can check the resulting pareto distribution running the plot_nsgaii_tutorial.py script.

thanks!

thouska commented 4 years ago

Indeed, looks very nice, its working and, on top, seems to be quite powerful! I updated the code a bit, to have it compatible with the master branch and added a unittest. Further, I renamed nsgaii_dev.py to nsgaii.py, i guess thats about time as the old nsgaii.py version wont be used any more, now that you provided this super nice developement. The unittest is running fine, as long as I set skip_duplicateson True. If it is set to False the algorithm is producing to many repetitions, e.g. like this:

    generations=40
    n_pop = 20
    skip_duplicates = True
    sampler=spotpy.algorithms.NSGAII(spot_setup(self.multi_obj_func),parallel=self.parallel, dbname='Rosen', dbformat=self.dbformat, sim_timeout=self.timeout)
    sampler.sample(generations, n_obj= 3, n_pop = n_pop, skip_duplicates = skip_duplicates)
    results = sampler.getdata()
    self.asserEqual(len(results), generations*n_pop) 

results in a AssertionError: 1540 != 800

iacopoff commented 3 years ago

Hi @thouska, it was an easy fix actually, now it is working fine. It seems that travis-ci has failed on the padds algorithm though, however I suppose it is not related to the nsgaii algorithm.