Need to rethink min model error

openghg / openghg_inversions

University of Bristol Atmospheric Chemistry Research Group RHIME Inversion code (with openghg dependency)

MIT License

5 stars 0 forks source link

Need to rethink min model error #136

Open joe-pitt opened 1 month ago

joe-pitt commented 1 month ago

In the early years of the HFO inversions, InTEM and ELRIS have almost zero emissions, while RHIME estimates substantial emissions for the whole time period, even though the obs show very few enhancements. The model error also looks wrong for RHIME in early years. Will elaborate on this further once I've investigated...

joe-pitt commented 1 month ago

OK I think I know (at least partially) what the problem is here. For the HFO runs we are using the same prior for all time, but in the early years there are basically no emissions, so obs are virtually flat and the prior mf timeseries has a lot of variability. This means the stdev(obs-prior_mfs) is large, and hence we apply a large min model error. This means that the posterior mf timeseries is allowed to vary significantly from the obs, in order to reduce the posterior-prior flux map difference. The upshot is that we get a strong influence of the prior in these early years, and the posterior fluxes aren't allowed to go to (near) zero.

I think for now I will have to turn off the min model error for my hfo runs. Going forward it would be good to come up with some other ideas for min model error calculations that would not be so sensitive to errors in the prior emission map.

joe-pitt commented 1 month ago

Thinking about it, this will impact any gas for which we are using a prior map that we don't have much confidence in, so I think it is relevant to the other PARIS runs too.

joe-pitt commented 1 month ago

One idea that would be somewhat similar to the intem approach (although not exactly the same). How about an option where min_error is equal to the median enhancement above prior baseline (maybe for MHD?), after filtering out points where the obs are below baseline? I'd need to work out how to implement this in the hbmcmc code, but in the icos jupyter notebook what I mean is:

ds_all_mf_sliced['rhime_comb_allobs'].Yobs.values = xr.where(ds_all_mf_sliced['rhime_comb_allobs'].Yobs < ds_all_mf_sliced['rhime_comb_allobs'].YaprioriBC,
                                                             np.nan,
                                                             ds_all_mf_sliced['rhime_comb_allobs'].Yobs).values
np.nanmedian(ds_all_mf_sliced['rhime_comb_allobs'].Yobs.values - ds_all_mf_sliced['rhime_comb_allobs'].YaprioriBC.values)

hdelongueville commented 2 weeks ago

In the same idea, for gases like HFC-227ea where sporadic pollution events happen, assuming the errors are mean zero is not really accurate (I think..). (y - y_mod) should have an asymmetric distribution and therefore (y - y_mod) - mean(y - y_mod) is not appropriate for this case.

hdelongueville commented 2 weeks ago

Correction for HFC-227: the current residual error method should be appropriate but may need to be sharpened, for example by calculating a min error for each site instead of one single min error.

brendan-m-murphy commented 1 week ago

In the same idea, for gases like HFC-227ea where sporadic pollution events happen, assuming the errors are mean zero is not really accurate (I think..). (y - y_mod) should have an asymmetric distribution and therefore (y - y_mod) - mean(y - y_mod) is not appropriate for this case.

If this is the case, we might want to use a "robust" distribution for the likelihood instead of a normal distribution. I have a book that recommends using a Student-T distribution, which has longer tails. I think this works by making big pollution events "less unlikely" so they have less influence on the posterior.

Without more time resolution in the flux, I don't think we can capture sporadic events, so I guess the best we can do it not fit to them too much.