rapidsai / cudf

cuDF - GPU DataFrame Library
https://docs.rapids.ai/api/cudf/stable/
Apache License 2.0
8.45k stars 908 forks source link

[FEA] cudf.Timestamp #12052

Open mattf opened 2 years ago

mattf commented 2 years ago

Is your feature request related to a problem? Please describe. rewriting code from pandas into cudf, using import cudf as pd

Describe the solution you'd like cudf.Timestamp matching https://pandas.pydata.org/docs/reference/api/pandas.Timestamp.html

vyasr commented 2 years ago

I suspect that the main reason we haven't bothered with this in the past is that it is a scalar and we tend to support pd.Timestamp directly in most places that would allow it so the benefits performance-wise are quite limited. Aside from strict API compatibility concerns, the one other benefit that I could see here is if cudf.Timestamp existed more like cudf.Scalar as a way to avoid needing constant H2D copies each time.

@brandon-b-miller @shwina what do you think about that (in the context of previous Scalar work)?

shwina commented 1 year ago

A Timestamp sounds like a cudf.Scalar with dtype=datetime64[resolution], and we can already construct those:

>>> cudf.Scalar(pd.Timestamp('2001-01-01 00:00:01'))
>>> Scalar(2001-01-01T00:00:01.000000, dtype=datetime64[us])

So should we just make it so that cudf.Timestamp is a cudf.Scalar (inheritance relationship)?

brandon-b-miller commented 1 year ago

I see no reason why the scalar machinery couldn't be used to back cudf.Timestamp from a technical perspective.

One question I would have is what would we do with the cudf.Scalar constructor when a timestamp is passed? Would we return a cudf.Timestamp at that point or would we still allow construction of a cudf.Scalar object of datetime dtype, that isn't a cudf.Timestamp?

shwina commented 1 year ago

Would we return a cudf.Timestamp at that point or would we still allow construction of a cudf.Scalar object of datetime dtype, that isn't a cudf.Timestamp?

Assuming we all like the inheritance relationship, I would strongly be in favor of the latter than the former. Returning a subclass from a superclass ctor a la cudf.Index is frustrating to support.

It doesn't feel too icky to do the latter to me. For instance, pd.Timestamp inherits from datetime.datetime, which is a fully constructible and usable type in and of itself.

vyasr commented 1 year ago

Yeah, I'm a hard no on having cudf.Scalar(...) return a cudf.Timestamp. I'm fine with the inheritance relationship though.

While we're at it we should also look at #5882.