min_error issues umbrella

qq23840 commented 2 months ago

After our discussion about min_error on Wednesday, I thought I would collate some of the issues here. So far my thoughts are:

There should be some sort of error/warning message when you pass arguments to min_error and also calculate_min_error. I did this by mistake today, and they produce conflicting behaviour - calculate_min_error overrides min_error, but it's not clear this is what's happening when you pass both in the .ini file
That being said, I think the behaviour when min_error = 10.0 or similar is passed into the .ini file is as expected. The problem I've had is with verifying this : the min_model_error variable and attribute are just the single value passed (or calculated by the residual/percentile method) repeated nmeasure times. I think it would be more informative to have the output variable report 'epsilon', which is the error actually passed to the likelihood in L310 of inversion_pymc.py. This does vary with time, although in practice if a large min_error is specified it will default to this most of the time.
My issue is coming somewhere else - I'm fairly confident the Likelihood is sufficiently broad, but some other step is meaning that my modelled obs are coming out with very narrow error bars. I'm looking into what that might be now....

Feel free to add other issues with min_error to this one - just thought I'd summarise here.

hdelongueville commented 2 months ago

Thanks for opening this issue Ben! I'd like to add that maybe a good way to add the min_error to the rhime output would be as a variable with the dimension (nsite), instead of (nmeasure) and also not as an attributes.

qq23840 commented 2 months ago

Maybe a 'min_error' variable with dimension nsite that shows the input min error, and a 'model error' variable with dimensions (nsite, nmeasure) that is what error values are actually passed to the inversion for each site (which is the larger of the min_error and the pollution-scaled error) ?

brendan-m-murphy commented 2 months ago

Thanks Ben!

I'm working on incorporating some the PARIS formatting code into inversions, which I think will help with this. (E.g. in PARIS formatting, epsilon is used explicitly in the outputs.)

min_error was originally a single value, so I put it in the attributes of the output (since we put info about the priors and so on there). If you specify min_error and make calculate_min_error = None, that is still the case. Otherwise, min_error is actually an array, but still stored in the attributes of the output. It turned out that this worked with the PARIS formatting code without having to modify anything, so I left it as is, since the long term plan was to move a lot of that code into inversions.

So, if you use calculate_min_error, then the min_error attribute will actually be a vector with the same length as, say Y (so could have nmeasure as coordinates).

Going forward, reporting min_error as a data variable, and reporting epsilon sounds like a good plan.

brendan-m-murphy commented 2 months ago

Also a warning for passing both min_error and calculate_min_error seems like a good idea. Alternatively, we could just have one parameter min_error in the ini, that could either be a float value, or "percentile", or "residual", and would either use the value provided, or calculate a value based on the specified method (this would probably mean fixedbasisMCMC would need minor changes).

qq23840 commented 2 months ago

I think having a single parameter would be far more elegant for min_error

openghg / openghg_inversions

min_error issues umbrella #203