thouska / spotpy

A Statistical Parameter Optimization Tool
https://spotpy.readthedocs.io/en/latest/
MIT License
254 stars 152 forks source link

Infeasible parameter sets during MCMC and eFAST sampling #145

Closed juancastilla closed 6 years ago

juancastilla commented 6 years ago

Hi @thouska et al: I have a model that may or may not converge depending on parameter combinations that are not entirely trivial i.e., this is not straightforward to fix by simply adjusting the max/min of the prior distributions, there are interaction effects. I do have control of what the model will output in case of infeasible parameter sets—this can be either NaNs or an unrealistic simulated value that will yield a low likelihood (e.g., sim_value=-9999).

Just wondering if anyone has come across this issue and how they have dealt with it?

I tried sim_value=-9999 approach but I think it can bias the convergence of the MCMC sampling. I then noticed (please correct me if I’m wrong) that DREAM.py for example has some protective code that discards chains having like=NaN (and these are not stored in the csv database). I do get better convergence using this approach.

However, with FAST this may be a show-stopper (same with Sobol and Morris) as it relies on a specific sampling strategy. Can anyone shed some light on whether the “missing” infeasible samples (parameter sets) will corrupt the FAST analysis? Roughly, 15% of my samples are infeasible, so for 10,000 samples I get around 8,500 valid samples. FAST will produce sensitivity indices, but are they reliable?

thouska commented 6 years ago

Hi @juancastilla , thank you for your message. Regarding to your questions, I'll try to give a rough guidance: 1) Yes, we have some users, which have to deal with models, which break with certain parameter combinations. I think this is unfortunately common in the hydrology community. If you have this already under control, this is great. If you want to use fast and/or a Bayesian approach with MCMC/DREAM, you are already using the best way to deal with this problem, which is returning very bad simulations. So, e.g. [-9999]len(typical_model_simulation). I would also test [0] len(typical_model_simulation), which might be a little bit less confusing for the algorithms to deal with. 2) DREAM is in most situations faster in convergence than MCMC. However, given that both algorithms come along with a metropolis decision method, they should both be able to deal with bad likelihoods, relatively undepending on the way you choose in (1). 3) I had the same problem that you have encountered with FAST. In my case there were around 10% of the model runs unusable. However, after some testing, I had the impression, that the results were still reliable. However, here it is important not to return NaN in the objective function, otherwise your sensitivity index might also become NaN. Choose as in (1) a very bad objective function value. To narrow this down, you could use the hymod example and some lines, which return in 15% of the cases -9999 as an objective function. Then you can compare the results. I uploaded a first code basis for that tutorial_fast_hymod.py