Open JPMastrogiacomo opened 2 months ago
Hi - thanks for raising this!
Let me take a look and see if I can write a test for this - after which I would definitely appreciate a PR.
Okay, so here's a branch that has a bunch of tests that should all fail now when >2D datasets are incorrectly aggregated: https://github.com/ks905383/xagg/tree/fix_dotprod_extradims
Would you like to try your fix in a PR to that branch?
Thanks so much for writing the tests! I will look into it and submit a PR if I can manage to fix it.
Hi. This has been a very useful package that I will likely implement in my research. I ran into problem with the aggregate function when my data has a time dimension in addition to lat and lon. This is only a problem when impl='dot_product' but not when impl='for_loop'.
The issue stems from xagg/core.py line 674:
aggregated_array = normed_weights.dot(var_array_filled)
. It also aggregates over the extra dimension (eg time) as well, when we want to retain that dimension and match the behaviour of impl='for_loop'. The following change seems to solve the problem:aggregated_array = normed_weights.dot(var_array_filled, dims='loc')
.Then we pick up an extra column (eg time) in the df_combined dataframe, so we must remove it starting at line 682:
ds_combined = xr.Dataset(data_dict)
columns_to_remove = list(ds_combined.dims)
columns_to_remove.remove('poly_idx')
df_combined = ds_combined.to_dataframe().reset_index()
df_combined = df_combined.drop(columns=columns_to_remove)
Finally line 691 has to change as well to match impl='for_loop':
wm.agg[var] = wm.agg[var].apply(np.array).apply(lambda x: [[x]])
I am not super familiar with the inner workings of xagg, but I think this is working if ds has an extra dimension or not. If this seems reasonable I can submit a pull request. Thanks for making this package public.