Open jbrockmendel opened 2 years ago
For the _range
methods, what if freq
is a lower resolution than reso
? e.g. date_range("2022", periods=3, freq="D", reso="ms")
If the to_
methods have inference, would the resolution of each argument be collected and the highest one chosen as the inferred reso? e.g. to_timedelta([timedelta(day=1), timedelta(second=1), timedelta(millisecond=1])
For the _range methods, what if freq is a lower resolution than reso? e.g. date_range("2022", periods=3, freq="D", reso="ms")
That wouldn't be a problem, would be identical to date_range("2022", periods=3, freq="D").astype("M8[ms]")
. What would be a problem is the reverse, where freq
is a higher-resolution than reso, e.g. date_range("2022", periods=3, freq="ns", reso="s")
. We'd probably need to disallow that.
If the to_ methods have inference, would the resolution of each argument be collected and the highest one chosen as the inferred reso? e.g.
to_timedelta([timedelta(day=1), timedelta(second=1), timedelta(millisecond=1])
In that particular case they are all pytimedelta objects which all get microsecond resolution. Suppose instead we have to_timedelta([Timedelta(days=1)._as_unit(unit) for unit in ["s", "ms", "us", "ns"]])
. I think the way I would implement this would be something like
def array_to_timedelta(objs):
try:
res = array_to_timedelta_with_reso(objs, "ns")
except OutOfBoundsTimedelta:
try:
res = array_to_timedelta_with_reso(objs, "us")
[...]
return res
def array_to_timedelta_with_reso(objs, reso):
for item in objs:
td = Timedelta(item)._as_unit(reso) # <- will raise if either overflow or casting involves rounding
[...]
This should avoid a major perf hit or API change for currently-working cases. The downside is it isn't inferring the best reso so much as the highest viable reso. Also wouldn't match scalar behavior.
would be identical to date_range("2022", periods=3, freq="D").astype("M8[ms]")
Okay that is reasonable. I think if constructors have arguments that allow multiple ways to specify resolutions (freq
, dtype
, reso
), we should definitely document the "order of operations"
Since I could not find anything on this in the current release notes for 2.0.0 I wanted to ask if there are any updates on this issue?
there is now a "unit"keyword in date_range and timedelta_range that specifies resolution. Haven't done to_datetime and to_timedelta yet.
there is now a "unit"keyword in date_range and timedelta_range that specifies resolution. Haven't done to_datetime and to_timedelta yet.
Documentation is silent about what is resolution and its possible values. Please add link for possible values on this page https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.date_range.html
Edit: if you pass any string to unit, Value error would provide you with documentation: ValueError("'unit' must be one of 's', 'ms', 'us', 'ns'")
In 2.0 we'll support non-nanosecond datetime64 and timedelta64. ATM date_range, timedelta_range, to_datetime, and to_timedelta still are nano-only. This issue is about how to support non-nano in these functions.
Two main options: inference or a keyword. A keyword would be something like
pd.date_range(start, end, periods=10, reso="ms")
, and the default would be "ns". This is the simplest thing to implement, but adds more API surface.inference for date_range would look at start and stop to determine the correct resolution. This could get messy if e.g. start and stop have different resos. ATM im thinking this isn't worth it.
inference for to_datetime (really in array_to_datetime) is more compelling in part bc I expect to_datetime to be called by library code for e.g. io.