libdynd / dynd-python

Python exposure of dynd
http://libdynd.org
Other
120 stars 23 forks source link

as_numpy on datetime array yields unexpected timezone result #143

Open mrocklin opened 9 years ago

mrocklin commented 9 years ago

If I don't think about timezones then the following is confusing. Presumably numpy and dynd disagree about local vs utc timezone as default.

In [1]: from dynd import nd

In [2]: d = nd.array(['2000-01-01', '2000-02-03'], dtype='datetime')

In [3]: x = nd.as_numpy(d, allow_copy=True)

In [4]: d
Out[4]: 
nd.array([2000-01-01T00:00, 2000-02-03T00:00],
         type="2 * datetime")

In [5]: x
Out[5]: array(['1999-12-31T16:00:00.000000-0800', '2000-02-02T16:00:00.000000-0800'], dtype='datetime64[us]')
mrocklin commented 9 years ago

Although fortunately

In [6]: nd.array(x)
Out[6]: 
nd.array([2000-01-01T00:00Z, 2000-02-03T00:00Z],
         type="2 * datetime[tz='UTC']")
mwiebe commented 9 years ago

Do you have any thoughts on what might be more reasonable behavior? Should maybe In [3] from your example raise an error instead of implicitly adding the UTC timezone?

mrocklin commented 9 years ago

So what is numpy doing here? Is it using local time zone as default? No time zone? I guess I'd like for dynd and numpy to share common defaults if possible.

mwiebe commented 9 years ago

NumPy presently has no "naive time zone" support. The storage is UTC, and displaying the time using the local time zone. At scipy was talking to @ChrisBarker-NOAA and @charris about this exact issue, and I unfortunately dropped the ball on following up with them. Our conclusion was that it would be good to nudge numpy a little bit closer to the dynd defaults, by defaulting to naive timezone there as well and fixing up the printing.

ChrisBarker-NOAA commented 9 years ago

For what it's worth, I think I've dropped the ball here -- or, maybe still carrying it.

We had some good discussion at SciPy, and came to some conclusions, but I have yet to clean up the notes and write it up properly, which I said I'd do. Stay tuned...

ChrisBarker-NOAA commented 9 years ago

@mrocklin : while numpy remains "broken", the trick is to add a UTC timezone specifier, even when you don't know the actual time zone -- using UTC everywhere is almost like a naive time zone.