using pestpp-ies while progressively increasing the complexity of inversion

BJEANNOT0 commented 2 years ago

Hello,

I would like to write about the way I use pestpp-ies in order to get your opinion on how despicable what I do is. I deal with a hydrological model that is very sensitive to parameters (it is a model of a porous fractured aquifer using a double-medium approach), with thousands of parameters and observations.

In order to lead pest++ in the right direction, my strategy is the following : -estimating parameters with pestpp-ies, trying to fit the observations of 3 piezometers. -using the set of parameters that produce the lowest phi for 3 piezometers as the base realization for an inversion with 6 piezometers -using the set of parameters that produce the lowest phi for 6 piezometers as the base realization for an inversion with 9 piezometers -using the set of parameters that produce the lowest phi for 9 piezometers as the base realization for an inversion with 12 piezometers -using the set of parameters that produce the lowest phi for 12 piezometers as the base realization for an inversion with 15 piezometers

I do it this way because I noticed that if I try to directly do the parameter estimation for 15 piezometers, i get less satisfactory results (e.g. worse fit for the best realization). I understand however that the principle of pestpp-ies is to sample a posterior parameter distribution, and I am about certain this is not exactly what I do anymore. Actually, this procedure enables me to avoid getting stuck in a local minimum of the objective function, getting most of the time at least one realization with a very low objective function. But I am not sure how to interpret the ensemble of realizations I get at the end of the process.

Do you have any thought on the way I use pestpp-ies ? Thank you for your help. Best regards

cnicol-gwlogic commented 2 years ago

One thing I wondered while reading your approach is: are you using localisation (either auto or manual)? I wonder if you don't have that option turned on, and that is kind of what you are doing by gradually optimising more obs iteratively.

BJEANNOT0 commented 2 years ago

You are right about me not using localization. I used it before but ended up giving it up. I will try again and keep you updated ! Thank you for your answer.

cnicol-gwlogic commented 2 years ago

For what it's worth I have only ever used the auto localiser and have always had success (...once I iron out all my other user errors in the model workflow).

jtwhite79 commented 2 years ago

That is an interesting approach - so are you adding new observations each time so that the second step is the first 3 piezos plus some new ones? This is almost a sequential estimation formulation - @aymanalz might have some interesting ideas about this...

How realizations are you using? In theory, if you use too few realizations, you wouldnt be able to fit a rich observation data set as well as you could with more realization - the number of realizations is more or less a representation of how many dimensions of the "solution space" you can occupy.

@cnicol-gwlogic makes a good point about localization - that can also help fit rich datasets with smaller ensembles...It might be interesting to look at which parameters are changing the most between steps...

And, most importantly, how well do you think you should be fitting the observations given the simplifications of the model and what forecasts you want to make? Is it meaningful to fit the observations so well? From a bias-variance tradeoff view, underfitting and keeping more variance in the posterior parameter ensemble (and therefore the forecasts) might be a more conservative approach...

aymanalz commented 2 years ago

Gradual assimilation of observations is indeed makes it like sequential assimilation @jtwhite79 . I agree with all above suggestions regarding using localization and increasing the number of realizations… they really can make big difference. Two comments that I would like add:

(1) You mentioned that directly estimating the parameters using the 15 piezometers, you get less satisfactory results (e.g. worse fit for the best realization). I think you should test the fit of the average of posterior realizations – not the best one.

(2) I would like also to add that diagnosing the initial residual errors might be helpful to reveal issues with the prior parameters (or what you call base values). In particular, you may need to look at the mean of the residual errors at the 15 piezometers. A good initial ensemble (or base value) is one that produces residual errors with both positive and negative error values – in other worlds we would like the model to produce predictions that envelopes (greater and less than) the observed values.

wkitlasten commented 2 years ago

Perhaps try using the mean parameter vector from the 3 piezometer runs as the base realization for an inversion with 6 piezometers, etc? Something like this:

pst.parameter_data['parval1'] = pe.mean()

Also, localization (or loads of realizaitons) seems to be key with ies.

BJEANNOT0 commented 2 years ago

Thank you everybody for your extensive feedback. Indeed, highly increasing the number of realizations produces results that are quite similar to what I get with my approach. I tried localization but I seem to always get a segmentation fault when doing it, probably because of memory considerations. I might post a question about it later. As for the current issue, I think we can consider it as closed. Thank you again for your support.

BJEANNOT0 commented 2 years ago

Hello again,

i have to write again after a few more attempts to emphasize that as you stated, localization really helped indeed. I am surprised to see that the objective function decreases twice faster with 100 realizations and automatic adaptative localization than with 1000 realizations without any localization.

Thank you.

cnicol-gwlogic commented 2 years ago

Wow! That's an interesting comparison. My typical model would have say anything from 10k to 50k obs, and 5k to 30k pars. These are often very information-rich (heavily impacted confined borefield drawdowns etc, maybe plus surface.flow and shallow unconfined gw obs). I usually get away with an ensemble of 300, but have once had to jump up to 600 to avoid ensemble collapse (that was a very info-dense obs set). My 2 cents. Thanks for the info.

jtwhite79 commented 2 years ago

@BJEANNOT0 Im interested to see those 100 reals +localization vs 1000 reals results. In theory, beyond helping with spurious correlation, localization also provides more degrees of freedom to fit observations but Ive not been able to manufacture a case to demonstrate that effect...

BJEANNOT0 commented 2 years ago

Understood. I will make a chart about it to show you my comparisons in the next few days. However I can already tell you that the ratio I gave before for 1000 realizations vs 100 with localization was only valid for the first few iterations, after a bit it becomes the opposite( i.e. the inversion with 1000 reals becomes the best after a few iterations). I will show this with a chart in my next post. Best regards,

BJEANNOT0 commented 2 years ago

@jtwhite79 @cnicol-gwlogic As promised, here is a comparison of using a lot realizations vs automatic adaptative localization. It seems that localization enables a quicker decrease of the objective function during the first few iterations, but that in the end using a lot of realizations performs better. Also, as regards ensemble collapse (which I think is measured by the number of parameters in the "n CV decr" column that appears in the case.rec file), I noticed that it does happen when using only 100 realizations without localization. Both localization and increasing the number of realizations enable to lower the intensity of the collapsing. This raises a question to me however : what is the percentage of parameters in the nC V decr column that is acceptable to state that the ensemble is not collapsed ? I noticed that performing a lot of iterations makes the number of parameters in the n CV decr column increase, iteration after iteration, and I wonder when I should stop doing any further iterations based on this output. Thanks for your inputs. Best regards,

jtwhite79 commented 2 years ago

The question about the number of iterations is critical. At the core, these ensemble methods are still optimization algorithms, which means they are going to drive the objective function as low as they can, and as this happens, the parameter variance across realizations drops quickly, especially in those later iterations. So its up to us to decide where to stop trying to fit. And its super subjective...the parameter change summary tries to provide some insights about this issue.

If you think about an imperfect model being used to assimilate data, it leads to the bias-variance tradeoff that is well-known in machine learning. In this framework, I prefer to "underfit", that is, purposefully not fit the observations as well as possible because underfitting means you prefer to retain parameter variance (and therefore forecast variance) rather the fit super good but risk bias is the parameter values (and therefore bias in the forecasts). In the groundwater modeling world, I would rather be uncertain than biased!

Typically, I will run IES out several iterations but usually take the 2nd or 3rd iteration results as the posterior. Recently, I was part of an effort where we used the 1st iteration results with a large initial lambda value to ensure we were conservative in the posterior parameter variance (and doing our best to avoid forecast bias)...

BJEANNOT0 commented 2 years ago

Thank you for your answer, it really helps me understand what I am doing when using pestpp-ies. Regards,

usgs / pestpp

using pestpp-ies while progressively increasing the complexity of inversion #192