Open lee1043 opened 11 months ago
Thanks for this feature suggestion Jiwoo. I agree, we should have a skipna
flag to replicate what Xarray offers.
Additional context
It would be even more helpful if users could set some criteria. For example, letting the user decide the fraction of NaN values.
Let's say, I have 10 values, which include 2 NaNs. I want to get an average with skipna=True. But when having 3 NaN values, I want to average to be NaN.
This is going to help the obs4MIPs process when handling with time-varying NaN values due to missed observation points.
This sounds similar to the weight_threshold
feature mentioned here https://github.com/xCDAT/xcdat/issues/531.
Can you provide some pseudo-code? Better yet, a prototype Python implementation would be great.
I think an alternative solution to skipna
is for the user to drop nan
values before calculating the average. @pochedls any thoughts for this specific enhancement?
I'm wondering if this would work. If we were dealing with time series:
ds.time = ["2010-01-01", "2010-02-01", "2010-03-01", "2010-04-01"]
ds.ts = [1, 2, np.nan, 4]
I think dropping the NaN would also drop the time point, which would create problems for a lot of applications. If I instead had a [lat, lon]
matrix:
ts = [[1, 2, 3],
[4, 5, 6],
[7, np.nan, 9]]
I'm not sure how this would work. What would the ts
matrix shape be – it would no longer be a [lat, lon]
grid?
Or am I thinking about this the wrong way?
@tomvothecoder @pochedls sorry that I haven't fully followed this, but just wondering if there to be any chance to follow upon this as Celine reached out for the same issue -- she wants to operate a spatial average while the data has NaN included.
@lee1043 – I don't think I can work on this soon. This could be an easy PR (or "dev day" issue) depending on the complexity of the implementation. There might also be work arounds using get_weights
(and the computing the mean yourself).
@tomvothecoder @pochedls sorry that I haven't fully followed this, but just wondering if there to be any chance to follow upon this as Celine reached out for the same issue -- she wants to operate a spatial average while the data has NaN included.
If you or somebody else can provide pseudo-code or a prototype Python implementation it can help speed up the implementation process for whenever @pochedls or I (or somebody else) has time. My dev time for new xCDAT features will be limited for the next few months because of conferences and other priorities.
Is your feature request related to a problem?
skipna=None
parameter is being used in xarray's mean function to allow user to decide whether skipNaN
values in averaging (thus average will be calculated using non-NaN values) or just returnNaN
for average when there are anyNaN
values used.https://docs.xarray.dev/en/stable/generated/xarray.DataArray.mean.html
Describe the solution you'd like
Convey
skipna
key to here: https://github.com/xCDAT/xcdat/blob/623814821a748bd2e2acc52971b359550c31913b/xcdat/spatial.py#L737Similar to temporal average functions when
.mean
being used.Describe alternatives you've considered
No response
Additional context
It would be even more helpful if users could set some criteria. For example, letting the user decide the fraction of NaN values.
Let's say, I have 10 values, which include 2 NaNs. I want to get an average with skipna=True. But when having 3 NaN values, I want to average to be NaN.
This is going to help the obs4MIPs process when handling with time-varying NaN values due to missed observation points.