Closed DifferentialOrange closed 2 years ago
Class inheritance should be improved before review.
>>> dt = tarantool.Datetime(year = 1970, month = 1, day = 2)
>>> dt
Timestamp('1970-01-02 00:00:00')
>>> type(dt)
<class 'tarantool.msgpack_ext.types.datetime.Datetime'>
>>> dt.floor('H')
Timestamp('1970-01-02 00:00:00')
>>> type(dt.floor('H'))
<class 'pandas._libs.tslibs.timestamps.Timestamp'>
Class inheritance should be improved before review.
>>> dt = tarantool.Datetime(year = 1970, month = 1, day = 2) >>> dt Timestamp('1970-01-02 00:00:00') >>> type(dt) <class 'tarantool.msgpack_ext.types.datetime.Datetime'> >>> dt.floor('H') Timestamp('1970-01-02 00:00:00') >>> type(dt.floor('H')) <class 'pandas._libs.tslibs.timestamps.Timestamp'>
Reworked
@oleg-jukovec , thank you for pointing out about custom timezones. Now we encode abbreviated timezones to datetime.timezone
with expected offset and Tarantool name. Decode for them is also supported.
If user wants to create a tarantool.Datetime
with Tarantool abbreviated timezone, he may build the custom timezone based on our autogenerated timezones info:
import tarantool.msgpack_ext.types.timezones as tt_timezones
tzinfo = datetime.timezone(
datetime.timedelta(minutes=tt_timezones.timezoneAbbrevInfo['MSK']['offset']),
name='MSK'
)
dt = tarantool.Datetime(
year=2022, month=8, day=31, hour=18, minute=7, second=54,
microsecond=308543, nanosecond=321, tzinfo=tzinfo
)
I'm not sure we should expose tt_timezones.timezoneAbbrevInfo
to a main tarantool
module, but this is debatable.
Tarantool datetime may provide timezone info in two forms: tzoffset
and tzindex
.
It corresponds to Lua datetime.new{}
tzoffset
and tz
arguments. tz
string is mapped to a tzindex
based on timezones.h header.
Based on Lua API, user can set up only tz
(same as tzindex
) or tzoffset
. If tz
(same as tzindex
) is set up, tzoffset
is computed based on tz
info. Both tzindex
and tzoffset
is provided to msgpack data if tzindex
is set up, but tzoffset
is expected to be based on tzindex
info: see tarantool/tarantool#7680.
What should we use to store timezone info? The solution should satisfy several criteria:
Let's discuss both of them.
We should unambiguously be able to encode it to the same msgpack, i.e. the same tzindex
, tzoffset
info. Actually, it's more tricky: not all timezones are fixed offset. For example, tzoffset
for 'Europe/Moscow'
timezone is 180
for 1.1.2008
and 240
for 1.7.2008
. So what we actually want is to be able to preserve tzindex
and to have tzoffset
the same
as it would be in Tarantool for same timestamp + tz (and this is really important, see tarantool/tarantool#7680 again).
What does it mean that it should be useful? It should be not just a reference info, but a real timezone. If user would want to do something with datetime info, it should behave appropriately. Again, for example, if user gets 1.1.2008
with 'Europe/Moscow'
and then adds half a year with some default method, it should be 1.7.2008
with 'Europe/Moscow'
considering winter time. It is also important since we encode timestamp values based on time since 1.1.1970
UTC
and it depends on thing like winter time.
There is a pytz
library (already a dependency of pandas
) that implements Olson database that tarantool uses to compute tzoffset
for named timezones. We may use pytz
library to work with timezone info. All of Tarantool ZONE_UNIQUE
s and ZONE_ALIAS
es are supported by pytz
. Tarantool also has ZONE_ABBREV
s: timezones with name and fixed offset. pytz
doesn't know about most of them, but it is easy to implement them manually with pytz.FixedOffset
or datetime.timezone(datetime.timedelta)
based on timezones.h header offset info.
It is rather inconvenient to store tzindex
(or tz
name) with some existing pytz
or datetime
tools. For example, we need to distinct fixed offset timezones with name and without a name. You can set up a name for datetime.timezone(datetime.timedelta)
,
but it could be retrieved only with tz.tzname(dt)
call. datetime.timezone
generates the name on tzname
call and there is no non-intrusive way to distinct autogenerated name from explicitly set up name. pytz.FixedOffset
could not have any name at all
(expect for pytz.FixedOffset(0)
which is actually UTC
). So it looks like the only way is
to decode tarantool timezones to custom tarantool.Timezone type.
Since we already use pytz.timezone
, let's use pytz.FixedOffset
as a base class for fixed timezone data.
This type should be useful, so it should implement standard datetime.tzinfo
interface (utcoffset
, tzname
and dst
). It would simply expose utcoffset
, tzname
and dst
methods of timezone data underneath class. With some additional handles, it would expose tzindex
(or tarantool tz
name) and the copy of underneath class (just in case).
tarantool.Datetime
supports building from pandas.Timestamp
or with tzinfo
argument. tzinfo
argument or pandas.Timestamp.tzinfo
may be not a tarantool.Timezone
. Using only tarantool.Timezone
in tarantool.Datetime
is a way to ensure that everything would be symmetrical on encode/decode.
So there are two possible ways:
tarantool.Timezone
,tarantool.Timezone
.If we won't be able to accept any other timezones, it would be an another burden on user's shoulders. To impove his experience, we may provide some migration advices.
On the other hand, converting may be provided not as expected by user. Since
tarantool.Timezone
on tarantool.Datetime
initialization.
To ensure our safety, we will strictly recommend user to use tarantool.Timezone
with warnings, warning message would contain info about how exactly we had converted tzinfo, so user could always check if his expectations is the same as what we do.Let's describe converting rules. If tzinfo
is a pytz
base class (pytz.tzinfo.BaseTzInfo
), we check its .zone
attribute,
and if it is not None
, we use it as timezone name. In result, tarantool.Datetime(name=zone)
and pytz.timezone(zone)
would have the same zone underneath.
If tzinfo
is not a pytz
base class, we call tz.tzname(dt)
, defined by interface, to get a timezone name. For example, pytz._FixedOffset
(it is not an instance of pytz.tzinfo.BaseTzInfo
) has None
name. We not use tz.tzname(dt)
for pytz
base class because it's output is frustrating. For example, tz.tzname(dt)
for pytz.timezone('Europe/Moscow')
is either MSK
or MSD
, and tarantool.Datetime(name='Europe/Moscow')
and tarantool.Datetime(name='MSK')
are different timezones.
If timezone has a name that is unknown to Tarantool, we raise an error. If timezone has None
name, we treat it as fixed offset zone.
We would not implement tzindex/tzoffset correspondence checkup. We will wait for tarantool/tarantool#7680 updates.
to decode tarantool timezones to custom tarantool.Timezone type.
Well, actually, you simply can't do it. pandas cannot work with any datetime.tzinfo
instance -- it could work only with pytz or dateutil timezones.
https://github.com/pandas-dev/pandas/issues/15986#issuecomment-315054517 https://github.com/sdispater/pendulum/issues/131
pandas source code is full of workaround to detect if it is pytz timezone and mess with its internals.
After discussion with @oleg-jukovec , we decided to implement tarantool.Datetime
API to be the same as in tarantool Lua datetime
module. You can build datetime from msgpack payload or with the same API as in Tarantool. Object expose all properties required to convert it to any other datetime (year
, month
, day
, hour
, minute
, sec
, nsec
, timestamp
, tzoffset
, tz
-- names are the same except for minute
instead of min
since it is a keyword in Python) but do not support in-built convertions to pandas
or do not expose internal pandas.Timestamp
or pytz
timezone to simplify the behavior.
@oleg-jukovec , @LeonidVas , new revision had been uploaded, humbly requesting one more review iteration.
I think this is a clearer solution than previous.
Yeah, it definitely is. Thank you for your advises, the last version of my implementation was dissatisfying for myself too.
msgpack: support datetime extended type
Tarantool supports datetime type since version 2.10.0 [1]. This patch introduced the support of Tarantool datetime type in msgpack decoders and encoders.
Tarantool datetime objects are decoded to
tarantool.Datetime
type.tarantool.Datetime
may be encoded to Tarantool datetime objects.tarantool.Datetime
stores data in apandas.Timestamp
object. You can createtarantool.Datetime
objects either from msgpack data or by using the same API as in Tarantool:tarantool.Datetime
exposesyear
,month
,day
,hour
,minute
,sec
,nsec
andtimestamp
properties if you need to converttarantool.Datetime
to any other kind of datetime object:pandas.Timestamp
was chosen to store data because it could be used to store both nanoseconds and timezone information. In-build Pythondatetime.datetime
supports microseconds at most,numpy.datetime64
do not support timezones.Tarantool datetime interval type is planned to be stored in custom type
tarantool.Interval
and we'll need a way to support arithmetic between datetime and interval. This is the main reason we use custom class instead of plainpandas.Timestamp
. It is also hard to implement Tarantool-compatible timezones with full conversion support without custom classes.This patch does not yet introduce the support of timezones in datetime.
Part of #204
msgpack: support tzoffset in datetime
Support non-zero tzoffset in datetime extended type.
Use
tzoffset
parameter to set up offset timezone:You may use
tzoffset
property to get timezone offset of a datetime object.Offset timezone is built with pytz.FixedOffset(). pytz module is already a dependency of pandas, but this patch adds it as a requirement just in case something will change in the future.
This patch doesn't yet introduce the support of named timezones (tzindex).
Part of #204
msgpack: support tzindex in datetime
Support non-zero tzindex in datetime extended type. If both tzoffset and tzindex are specified, tzindex is prior (same as in Tarantool [1]).
Use
tz
parameter to set up timezone name:You may use
tz
property to get timezone name of a datetime object.pytz is used to build timezone info. Tarantool index to Olson name map and inverted one are built with gen_timezones.sh script based on tarantool/go-tarantool script [2]. All Tarantool unique and alias timezones presents in pytz.all_timezones list. Only the following abbreviated timezones from Tarantool presents in pytz.all_timezones (version 2022.2.1):
pytz does not natively support work with abbreviated timezones due to its possibly ambiguous nature [3-5]. Tarantool itself do not support work with ambiguous abbreviated timezones:
If ambiguous timezone is specified, the exception is raised.
Tarantool header timezones.h [6] provides a map for all abbreviated timezones with category info (all ambiguous timezones are marked with TZ_AMBIGUOUS flag) and offset info. We parse this info to build pytz.FixedOffset() timezone for each Tarantool abbreviated timezone not supported natively by pytz.
Closes #204