Closed charlesbluca closed 11 months ago
@charlesbluca any chance of a cuDF-only reproducer?
The bug is that the dtype of the column representing the count
aggregation is wrong if the dataframe is empty:
import cudf
df = cudf.DataFrame({"a": [1], "c": ["foo"]}, dtype={"a": "int64", "c": "object"})
df.groupby('a').agg({'c': "count"}).dtypes
# c int64
# dtype: object
# But
df.iloc[:0].groupby('a').agg({'c': "count"}).dtypes
# c object
# dtype: object
Describe the bug I'm encountering unexpected
DataErrors
andRuntimeErrors
when attempting groupby aggregations on an empty dask-cudf dataframe that I do not encounter with 23.08; it's not immediately obvious if this change was intentional, as these generally seem like groupby aggs that should be somewhat trivial.Steps/Code to reproduce bug
Expected behavior I would expect the same behavior that occurred using dask-cudf 23.08, which in this case returned the respective empty dataframe results of the above groupby aggs.
Environment overview (please complete the following information)
Environment details
Click here to see environment details
Additional context Noticed this while working on https://github.com/dask-contrib/dask-sql/pull/1220