Closed MarcoGorelli closed 8 months ago
Thanks for reporting, Marco! A couple of side notes here - both of which I think you may be aware but I'll include here for posterity:
In [14]: df.groupby('b').agg({'a': 'sum', 'c': 'mean'})
Out[14]:
a c
b
4 3 7.5
5 3 9.0
.apply()
is going to be much slower than agg()
in the general case. It works by iterating over the groups and applying the UDF to each group. In certain cases, we are able to JIT compile the input function to .apply()
and it can be very fast - see https://docs.rapids.ai/api/cudf/stable/user_guide/guide-to-udfs/#overview-of-user-defined-functions-with-cudf. Hi @MarcoGorelli, thanks for reporting. This case wasn't handled by cuDF's apply post-processing machinery. I've opened a PR that should fix the problem.
Describe the bug
In pandas I can do the following:
Steps/Code to reproduce bug
In cudf, however:
Expected behavior same output as pandas
Environment overview (please complete the following information)
Environment details Please run and paste the output of the
cudf/print_env.sh
script here, to gather any other relevant environment detailsClick here to see environment details
Additional context Add any other context about the problem here.