Open lafrech opened 1 year ago
From the original implementation in https://github.com/pandas-dev/pandas/commit/6e04264250f27787075dc0cb1685e20e7a6af071, I think the error message could have been clarified as components
means from keyword only arguments
Perhaps. This could be a won'tfix, then. Still it is a bit surprising.
# 1/ This works, following calls are equivalent AFAIU
Timestamp(2019, 10, 27, 1, 30, tz=tz)
Timestamp(year=2019, month=10, day=27, hour=1, minute=30, tz=tz)
# 2/ This works
Timestamp(year=2019, month=10, day=27, hour=1, minute=30, tz=tz, fold=fold)
# 3/ This does not work
Timestamp(2019, 10, 27, 1, 30, tz=tz, fold=fold)
Why? Pandas understands the positional args, so how come 3/ is not equivalent to 2/? I may lack time/skills to dig into the code and understand why, but it is not intuitive to me.
Besides, it worked with Pandas 1.5.
In any case, the error message is misleading. And probably the docs too.
@AlexKirko do you happen to remember when adding fold
, were other Timestamp components supposed to be added as keyword only arguments when specifying fold
?
I'm trying to determine if originally in https://github.com/pandas-dev/pandas/pull/31563 if 3/ was never supposed to work even in 1.5
Hello, this is what I can remember or could dig up:
In the examples that we documented then, we instruct the user to build from a naive datetime or keywords:
pd.Timestamp(datetime.datetime(2019, 10, 27, 1, 30, 0, 0),
tz='dateutil/Europe/London', fold=0)
pd.Timestamp(year=2019, month=10, day=27, hour=1, minute=30,
tz='dateutil/Europe/London', fold=1)
We never say that the user can't build from positional arguments.
Now, the code responsible for the check is here:
if fold is not None:
if fold not in [0, 1]:
raise ValueError(
"Valid values for the fold argument are None, 0, or 1."
)
if (ts_input is not _no_input and not (
PyDateTime_Check(ts_input) and
getattr(ts_input, "tzinfo", None) is None)):
raise ValueError(
"Cannot pass fold with possibly unambiguous input: int, "
"float, numpy.datetime64, str, or timezone-aware "
"datetime-like. Pass naive datetime-like or build "
"Timestamp from components."
)
From what I remember and see in the code, we intended to raise an exception here only when we are trying to build from a non-ambiguous datetime-like input: one that passes PyDateTime_Check
and has tzinfo
(or does not pass PyDateTime_Check
, I suppose).
I would say, the behavior in 2.0 is a bug. If we support the positional argument convention to mimic datetime
, then passing the equivalent of a naive datetime through positional arguments should behave the same as a naive datetime.
My guess would be that something in 2.0 makes us pass a ts_input
that is not _no_input
in @lafrech 's example? Could try to chase this down during the weekend. @mroeschke , what do you think?
If we support the positional argument convention to mimic datetime, then passing the equivalent of a naive datetime through positional arguments should behave the same as a naive datetime.
Spot on. That would be consistent, indeed.
Could try to chase this down during the weekend.
Thanks @AlexKirko.
Hm, on 1.5.3 fold
is actually None before this check (on main
it's 0):
# Allow fold only for unambiguous input
if fold is not None:
if fold not in [0, 1]:
raise ValueError(
"Valid values for the fold argument are None, 0, or 1."
)
Also, on 1.5.3 ts_input
is a datetime.datetime
object, and on main
it's an int
(just has the year in it). We then try to get a timezone from an int, and this is where we error out.
I'll go through the constructor code and see what changed.
Somehow we used to go into this if
in this situation:
if tzinfo is not None:
# GH#17690 tzinfo must be a datetime.tzinfo object, ensured
# by the cython annotation.
if tz is not None:
if (is_integer_object(tz)
and is_integer_object(ts_input)
and is_integer_object(freq)
):
# GH#31929 e.g. Timestamp(2019, 3, 4, 5, 6, tzinfo=foo)
# TODO(GH#45307): this will still be fragile to
# mixed-and-matched positional/keyword arguments
ts_input = datetime(
ts_input,
freq,
tz,
unit or 0,
year or 0,
month or 0,
day or 0,
fold=fold or 0,
)
nanosecond = hour
tz = tzinfo
print("auxilary constructor")
print(ts_input)
return cls(ts_input, nanosecond=nanosecond, tz=tz)
raise ValueError('Can provide at most one of tz, tzinfo')
Now we no longer do, which looks correct, but where before we would pass a timezone-aware datetime without the fold argument, we now keep going and pass positional arguments.
Looks like the culprit is the predicate. We currently have:
if (ts_input is not _no_input and not (
PyDateTime_Check(ts_input) and
getattr(ts_input, "tzinfo", None) is None)):
Which is equivalent to:
if (ts_input is not _no_input and (
not PyDateTime_Check(ts_input) or
getattr(ts_input, "tzinfo", None) is not None)):
I think we should have:
if (ts_input is not _no_input and
PyDateTime_Check(ts_input) and
getattr(ts_input, "tzinfo", None) is not None):
This fixes the OP's problem, let me see is this passes the test suite.
Of course not. Anyway, I'll find a predicate variant that does what we want and passes the tests.
Also, this looks like a kludge to me:
elif is_integer_object(year):
# User passed positional arguments:
# Timestamp(year, month, day[, hour[, minute[, second[,
# microsecond[, tzinfo]]]]])
ts_input = datetime(ts_input, year, month, day or 0,
hour or 0, minute or 0, second or 0, fold=fold or 0)
unit = None
In fact, ts_input is the year (which is what the datetime constructor expects).
print(ts_input)
print(_date_attributes)
> 2020
> [1, 1, 0, 0, None, None, None, None]
This should probably be carefully fixed in a separate PR, if it's not just me finding weird that month is assigned to the variable named year
.
Pandas version checks
[X] I have checked that this issue has not already been reported.
[X] I have confirmed this bug exists on the latest version of pandas.
[ ] I have confirmed this bug exists on the main branch of pandas.
Reproducible Example
Issue Description
Code works with Pandas 1.5. Possible Pandas 2.0 regression?
Search yielded #44378 except I do build from components.
When passing components as kwargs, it works:
I figured maybe I'm not using the constructor correctly but I couldn't find a mention in the docs preventing this usage.
The only test I found regarding this is this one:
https://github.com/pandas-dev/pandas/blob/ee84ef2753f396b6b9edb514dc2d1a72f78bd1ed/pandas/tests/scalar/timestamp/test_constructors.py#L839-L847
It is this test that led me to the "pass all as kwargs" solution.
Is this a bug or a known limitation?
Expected Behavior
Installed Versions