hzambran / hydroPSO

Model-Independent Particle Swarm Optimisation for Environmental Models
https://cran.r-project.org/package=hydroPSO
GNU General Public License v2.0
36 stars 18 forks source link

How to determine npart--number of particles in the swarm? #16

Closed WatershedFlow closed 4 years ago

WatershedFlow commented 4 years ago

How to determine npart--number of particles in the swarm? it seems that npart =40 is used as default.

Does npart depend on number of parameters used in calibration or other factors? For example, I am calibrating VIC model, which is a grid based model. I have 6 parameters to be calibrated, but I need to repeat 100 times for each parameter because I have 100 grids. Each grid is independent and is calibrated independently, although I use the same value for each grid at each calibration run.

So what npart should I use? 5, 40, or 100 in such case?

hzambran commented 4 years ago

Indeed, 40 particles is the default and the number of particles to be used expend on the number of parameters to be calibrated.

The number of particles required to calibrate N parameters does not grow linearly with N (course of dimensionality), so you need to try what is the best number for you. In my experience, at the beginning it is best to try a high number of particles and a low number of iterations to evaluate the performance of hydroPSO, and then reduce the number of particles and increase the number of iterations.

In your case (6 parameters, 6 grid cells), the answer will depend on your calibration strategy. usually calibrate each grid cell individually is not wise because you can end up with very different parameter values in neighbouring cells. So, if your conceptual model allows it, you might define some hydrological response units (HRU) with common parameter values, and then calibrate the parameters for each HRU instead of calibrating all the cells (let's say you have 3 HRU, you'll need to calibrate 3x6=18 parameters instead of 100x6=600 parameters)

IHTH.

WatershedFlow commented 4 years ago

I also found it is a good strategy to have large number particles and small number of iterations, and with All parameters and greatest ranges for each parameter in the beginning; then have smaller number of particles and few more important parameters.

I use VIC model (a R package of VIC model) for my work, it is a grid based model. I have 6 parameters for calibration and 198 grid cells. Based on my experience, it appears working with a particle number 5~10 and an iteration number of 20, which will generate 100 runs.