rapidsai / cudf

cuDF - GPU DataFrame Library
https://docs.rapids.ai/api/cudf/stable/
Apache License 2.0
8.45k stars 908 forks source link

[FEA] dt.total_seconds #16802

Open MarcoGorelli opened 2 months ago

MarcoGorelli commented 2 months ago

Is your feature request related to a problem? Please describe.

I wish I could use cuDF to do .dt.total_seconds on a timedelta column

Describe the solution you'd like

.dt.total_seconds

Describe alternatives you've considered

Additional context

We're (for now) xfailing tests in Narwhals https://github.com/narwhals-dev/narwhals/pull/951

mroeschke commented 2 months ago

Thanks for the report!

Although this could be easily implemented by summing the timedelta components, I think there was a desire to implement a dedicated libcudf kernel for total_seconds to avoid the n kernel launches to sum each individual component (IIRC cc @bdice you may have been apart of that passing discussion somewhere)

bdice commented 1 month ago

I think this might be simple, and may not require summing components. I think we can do a conversion/cast to duration_s and then cast that as a float type to match pandas.

edit: we may need to cast to the smallest duration type (nanos?) and then divide by the appropriate scale factor (1e9), in order to retain subsecond information.