thouska / spotpy

A Statistical Parameter Optimization Tool
https://spotpy.readthedocs.io/en/latest/
MIT License
254 stars 152 forks source link

SCE-UA runs over max repetitions #183

Closed rmlz closed 6 years ago

rmlz commented 6 years ago

Hi, my name is Ramon and I'm a researcher living in the city of Brasilia, Brazil.

I'm using spotpy to design a computational process to calibrate the CE-QUAL-W2 model and assess it as a water quality tool for a lake in my city.

I just finished the spotsetup() class. The input entries and the outputs seems to be working very well. As CE-QUAL-W2 is a very time-expensive model, I tried to run some tests with only 5 repetitions. Even though optimization gets a bad result, I'm waiting for the SCE-UA to finish runnning after 5 reps, and it didnt happen. It keeps running and showing messages eg: "10 of 5 (best like=-0.219832) est. time remaining: 23:59:59"_.

I'm afraid I couldnt understand how SCE-UA works (for example, it d be a regular behavior, and I must set a bigger number of reps), OR if I could have been doing something wrong.

The answer for this little question will help me strongly and lead me to further investigation.

I aprecciate it. Have a good one.

philippkraft commented 6 years ago

SCE-UA is a method, where one needs (depending on the number of parameters) thousands to hundreds of thousand runs to get a reliable restult - hence we never tested 5 runs. To run a model with 5 different parameter sets, you do not need an algorithm or automation. We might be able to help you with the algorithm choice, but we need to know the following: How long does a single simulation take on your computer (minutes, hours, days?), how many parameters do you change (1, 5, 100) and how precise must your result be (just some run that is kinda ok, or the definite optimum solution or a quantified uncertainty) and if you can get access to a supercomputer at your university / research facility, which is relative simple to use together with spotpy. However, calibrating a model usually involves a computer to run continuously for several days.

For automated calibration, you need always many, many model runs, even for few parameters (e.g. 3-5) and fast converging algorithms you need at least hundreds of runs. It is therefore a good idea to think about making your model faster, if that is possible. Can you use a coarser spatial resolution? I've never use CE-QUAL-W2, but spatial explicit models like that usually get much faster with fewer segments or layers.

rmlz commented 6 years ago

@philippkraft

Thanks for your answer! I'm sorry I couldnt reply fast enough! I've seen in docs that I can try to run multiple instances of the model since I have an available cluster with linux installed. In the department I work, we have a 32-core cluster that has been underused. It doesnt have linux installed, and I'm not sure if it's simple to install a ubuntu boot as easy it is for a personal computer. Anyway, I just started a calibration run today. I'm using SCE-UA alg and I asked the script to run the calibration 1000 times. It seems that it ll finish on monday (as long as everything works fine!)

The mentioned run is a hydrodynamics calibration test, and we are trying to find a good set for only 3 parameters. (Vertical eddy diffusity coeff, Horizontal eddy diffusity coeff, Vertical light extinction coeff). Below I ll give you more info about what we have been doing for this work.

The very first objective of this work is to calibrate the temperature (vertically) of 5 points (observation posts) in Paranoá Lake. The lake is an urban-dentritic lake with one main body and 4 branches that is in the heart of Brasiília (Capital of Brazil). We have developed a distributed model using CE-QUAL-W2 with 64 layers, cause we d like to test the hypothesys: "We can better understand many of the dynamics in the Lake if we could see how it's behavior goes through its branches.".

We want the calibration run to find a set that can calibrate the model as better as possible for the validation period. Once it's termically calibrated, we can investigate how its dynamics develop through the points we dont have observed data.

Other team is researching about the water quality of Paranoá Lake. Still they cant go further as long we didnt set the hydrodynamics in a good result.

So, answering all your questions:

How long does a single simulation take on your computer (minutes, hours, days?) The model run time is very sensitive to the number of parameters that has been turned ON in it's configuration. The early tests we did, it took from 1 to 5 minutes to finalize a 3years simulation. But if you add the water quality computation, it can take 2-5x more.

How many parameters do you change (1, 5, 100) and how precise must your result be (just some run that is kinda ok, or the definite optimum solution or a quantified uncertainty)? Definite optimum solution. For now we are trying to "mess" with 3 hydrodynamics parameters. In another approach we tried to select a parameter value for each branch of the lake, plus its main body (Total 5*3parameters = 15parameters). We are not sure on what approach to follow, so we are going as simple as possible. Further, we want to start to play with the water quality parameters. (that can be more than 100 parameters for the many nutrients, algae, zooplankton, macrophyte groups we may add to the model!) Can you get access to a supercomputer at your university / research facility? Yes I can!

I hope I could give you some good info about what we are up to here in Brazil! Thanks for the answer again, have a good weekend!