Open fjanoos opened 5 years ago
You should be getting a warning about this if you use the latest version of pandas. In the future, this behavior will change to return an object dtype array full of pandas Datetime objects. Unfortunately NumPy doesn't have a built-in datetime with time-zone stype, so this is about the best we can do.
Just wanted to rekindle discussion here and ping @dcherian and @benbovy , the current workaround for pandas DatetimeIndex with timezone info (dtype='datetime64[ns, EST]') is to drop the timezone piece or use to_index()
and operate in pandas, then reassign the time coordinate: See https://github.com/pydata/xarray/issues/1036 and https://github.com/pydata/xarray/issues/3163.
If I'm following https://github.com/pydata/xarray/blob/master/design_notes/flexible_indexes_notes.md this is another potential example of improved user-friendliness where we could have timezone-aware indexes and therefore call pandas methods like pandas.core.indexes.datetimes.DatetimeIndex.tz_convert()
directly as a DataArray method?
This would definitely be great for remote sensing data that is usually stored with UTC timestamps, but often analysis requires converting to local time.
I am confused on the following point after reading the indexing refactor design notes on removing IndexVariable.
If ds["time"]
is a 1D indexed coordinate, is ds["time"].data ≡ ds.indexes["time"].data
? If so, that would just be a pd.DatetimeIndex
which is timezone-aware and then this problem is solved because we don't maintain a separate numpy array. Am I understanding this correctly?
If
ds["time"]
is a 1D indexed coordinate, isds["time"].data ≡ ds.indexes["time"].data
? If so, that would just be apd.DatetimeIndex
which is timezone-aware and then this problem is solved because we don't maintain a separate numpy array. Am I understanding this correctly?
No, unfortunate it is not possible to use a pandas.Index
directly inside Variable.data
, because pandas.Index is not compatible with the NumPy array API -- in particular it is stuck with 1D data. Instead, we will need to wrap the array in some adapter class to make it compatible. Ideally this wrapper would be a fully N-dimensional wrapper for pandas.Series
objects, but for a first pass it would probably be fine to raise an error if indexing would create a higher dimensional array.
The bigger issue is that elsewhere in Xarray probably needs updates to avoid assuming that all dtype objects are numpy.dtype
instances.
Problem Description
When using DataSet.from_dataframe (DataArray.from_series) to convert a pandas dataframe with DateTimeIndex having a timezone - xarray convert the datetime into a nanosecond index - rather than keeping it as a datetime-index type.
MCVE Code Sample
Expected Output
After removing the tz localization from the DateTimeIndex of the dataframe , the conversion to a DataSet preserves the time-index (without converting it to nanoseconds)
Output of
xr.show_versions()