Closed Plantain closed 8 months ago
Hi! Thanks for submitting the bug. On a first glance, this seems to be more xarray-related but we should definitely investigate where this is coming from. Would you be able to share your data file so I can try to reproduce it? Also, which xarray version are you using? The show_versions()
output seems broken.
Hi @Plantain, just wanted to check in what the status on the bug is.
I am using Xarray 2023.7.0 and Xwrf 0.0.2 in Python 3.9.17.
I also encountered this same issue. Through debugging, I was able to determine that in my case XTIME
is the variable causing Xarray's to_netcdf()
to error.
In XWRF _decode_times()
, I see this comment:
# make XTIME be consistent with its description
Why is this needed? I don't quite understand the purpose of this code: https://github.com/xarray-contrib/xwrf/blob/48bc0611668fd6498d3b29ad3f084f4310ff4c2e/xwrf/postprocess.py#L28-L35
It seems this code is only ever invoked if decode_times=None
(default) or decode_times=True
is passed to xr.open_dataset
. In this case, XTIME
has a dtype of <M8[ns]
(same as datetime64
I think). Otherwise, the actual floating point number of minutes in XTIME
is retained, and ignored by XWRF.
Looking at the output of the current code in XWRF, it seems that what it does is convert a datetime64
object to a timedelta64
representing the total number of nanoseconds
in the datetime64
time in XTIME
. Later on, when Xarray attempts to write this to netcdf, the type of XTIME
has changed out from under Xarray since the read of the file. Xarray correctly calls encode_cf_timedelta
because the type of the object is a timedelta64
inside XTIME
at the time of writing, but the units that Xarray has registered with the XTIME
variable from decoding the file are not for type timedelta64
, they are datetime64
. It finds what it thinks are timedelta64
units that do not end with s
, and then tries to force datetime64
units into timedelta64
units by adding an s
:
minutes since 2020-01-02 18:00:00s
Which is not in the dictionary of acceptable timedelta64
units in Xarray:
def _netcdf_to_numpy_timeunit(units: str) -> str:
units = units.lower()
if not units.endswith("s"):
units = f"{units}s"
return {
"nanoseconds": "ns",
"microseconds": "us",
"milliseconds": "ms",
"seconds": "s",
"minutes": "m",
"hours": "h",
"days": "D",
}[units]
Resulting in KeyError
:
KeyError: 'minutes since 2020-01-02 18:00:00s'
What is the goal of converting XTIME
from datetime64
to timedelta64
? Maybe it needs to be assigned to a new variable in the dataset if an actual timedelta64
is needed for something?
Maybe the intent of this code was to handle the case where a user chooses not to decode_times
with Xarray and it is actual minutes instead of a datetime64
object?
Happy to open a PR to work around this, but wanted to understand the intent here first. Thanks!
Hi @DanielAdriaansen, thanks for the investigation of this bug. This is indeed something we were discussing at the very beginning of the project and it might turn out to not be quite correct in the way we do it currently.
Fundamentally, WRF saves two types of time variables: Times
which is a |S19
-type of the time-step of each frame in the netcdf-file and XTIME, which is of Int
-type (?) and which gives the offset from the start of simulation. At the moment, we are using the Times
variable to generate the Time
coordinate as datetime64
. Because we didn't want to have exactly the same information in XTIME, Times and Time we decided that it might be convenient for people if XTIME were actually a timedelta
-offset as it is indicated in the variable description. However, it seems like there are some conventions in Xarray about the units for timedelta
, which we don't fulfil, causing this bug.
As far as I understand the conventions around times (and please do correct me here) is that the unit string minutes since <date>
in a netcdf file gets converted to datetime
units in xarray and vice-versa. So I think it might be more in line with CF-complicance if we either convert XTIME to datetime
or just leave it as it is. What do you think? Also @jthielen
Sorry for the delayed response on this! I definitely concur with @lpilz's points, and will just add that part of the problem may be due to this time format conversion originating from the original xwrf implementation (e.g., https://ncar.github.io/esds/posts/2021/xarray-wrf-example/#installing-xwrf) when xwrf worked through the io backend of xarray rather than an after-open accessor, so the order of operations was likely different. We should probably revisit the design decisions around what time coordinate(s) xWRF's post-processed output presents to users, and make sure that our implementations of those play nicely with what xarray does by default in both the decode_times
cases...I'm not sure what's best at this point in time.
I think my main conceptual problem with the situation is that when we apply the postprocess
function, we change the xarray object in a way that 1. Saving it does not work any more (because of the pyproj object not being serializable) and 2. Even if one excludes this variable, the resulting file is wildly different than the original.
One possible remedy (which probably would take some engineering effort) is to build a .xwrf.to_netcdf
function, which saves it as a WRF-style netcdf. This would make it easier to just pick-and-choose which (time-) variables we want to retain and how they should look because we could go as far as we want from the underlying WRF-style file.
However, since this will probably take some time I would propose that we just remove the treatment of XTIME
for the time being and just let xarray
handly encoding and decoding...
What happened?
I am not sure if this is intended to be a supported behaviour, but it seems like it probably shouldn't error like this at least. I am trying to convert wrfout files to something at least close to cf-compliant, I realise xwrf is not yet complete in this regard.
Minimal Complete Verifiable Example
No response
Relevant log output
No response
Environment
Anything else we need to know?
No response