Open mattf opened 2 years ago
I suspect that the main reason we haven't bothered with this in the past is that it is a scalar and we tend to support pd.Timestamp
directly in most places that would allow it so the benefits performance-wise are quite limited. Aside from strict API compatibility concerns, the one other benefit that I could see here is if cudf.Timestamp
existed more like cudf.Scalar
as a way to avoid needing constant H2D copies each time.
@brandon-b-miller @shwina what do you think about that (in the context of previous Scalar work)?
A Timestamp
sounds like a cudf.Scalar
with dtype=datetime64[resolution]
, and we can already construct those:
>>> cudf.Scalar(pd.Timestamp('2001-01-01 00:00:01'))
>>> Scalar(2001-01-01T00:00:01.000000, dtype=datetime64[us])
So should we just make it so that cudf.Timestamp
is a cudf.Scalar
(inheritance relationship)?
I see no reason why the scalar machinery couldn't be used to back cudf.Timestamp
from a technical perspective.
One question I would have is what would we do with the cudf.Scalar
constructor when a timestamp is passed? Would we return a cudf.Timestamp
at that point or would we still allow construction of a cudf.Scalar
object of datetime
dtype, that isn't a cudf.Timestamp
?
Would we return a cudf.Timestamp at that point or would we still allow construction of a cudf.Scalar object of datetime dtype, that isn't a cudf.Timestamp?
Assuming we all like the inheritance relationship, I would strongly be in favor of the latter than the former. Returning a subclass from a superclass ctor a la cudf.Index
is frustrating to support.
It doesn't feel too icky to do the latter to me. For instance, pd.Timestamp
inherits from datetime.datetime
, which is a fully constructible and usable type in and of itself.
Yeah, I'm a hard no on having cudf.Scalar(...)
return a cudf.Timestamp
. I'm fine with the inheritance relationship though.
While we're at it we should also look at #5882.
Is your feature request related to a problem? Please describe. rewriting code from pandas into cudf, using
import cudf as pd
Describe the solution you'd like
cudf.Timestamp
matching https://pandas.pydata.org/docs/reference/api/pandas.Timestamp.html