Open hdelongueville opened 2 months ago
Here's a suggestion for dealing with this within inversions:
From the openghg get_obs_surface
function this is the relevant functionality:
# Resampling may introduce NaNs, so remove, if not keep_missing
if keep_missing is False:
ds_resampled = ds_resampled.dropna(dim="time")
https://github.com/openghg/openghg/blob/devel/openghg/retrieve/_access.py#L407
So this is just dropping the time point if there are NaN values for any data variables at the moment. What we could do to help with this would be to specify a subset
input for dropna
to include a list of specific variables to check e.g. something like:
check_nan_subset = ["mf", "mf_repeatability", "mf_variability", ...]
...
ds_resampled = ds_resampled.dropna(dim="time", subset=check_nan_subset )
And we could make it so that subset
could be specified by the user with some useful default as shown above.
The question would be: what defaults would be sensible for this and cover enough cases?
The function get_obs_surface is returning data with no values in it.
From this it also sounds like there's a second issue as well around get_obs_surface
returning empty data which may not be helpful. As an alternative, this something that could be checked for and an error raised rather than returning the data? Would that be preferable?
check_nan_subset = ["mf", "mf_repeatability", "mf_variability", ...] ... ds_resampled = ds_resampled.dropna(dim="time", subset=check_nan_subset )
And we could make it so that
subset
could be specified by the user with some useful default as shown above.The question would be: what defaults would be sensible for this and cover enough cases?
This sounds like a good plan to me. The default should definitely include mf
- maybe that is the only one that is absolutely essential?
Did anyone have a quick example of the data that caused this problem by the way? Would be useful to have to be able to add a check in for this.
Would it be possible to get an example of this data? Would be useful to allow checks to be added for these issues.
I currently only have access to the object store. @joe-pitt, think this might be something you could help with?
An example would be the files in: /group/chem/acrg/obs_raw/EYE-AVE-PAR/EYE-AVE-PAR_2.2. Many of these have nans for things like LTR, STTB and Unc_n2o. At the moment the icos standardise function only loads unc_n2o (at one point I experimented with adding the others, hence sttb is mentioned in original post). See also this issue: https://github.com/openghg/openghg_inversions/issues/212
What is your issue?
The function get_obs_surface is returning data with no values in it. This issue is caused by the presence of variables filled with NaN values, which are not used in the process.
For example, in the case where there is a variable sttb full of NaNs, all the values are dropped because of that.
A quick fix is to set keep_missing=True, to skip the step that drops the NaNs. What is the best long term solution though?