rapidsai / cudf

cuDF - GPU DataFrame Library
https://docs.rapids.ai/api/cudf/stable/
Apache License 2.0
8.45k stars 908 forks source link

[BUG] groupby(...).apply raises when aggregation returns multiple outputs #15084

Closed MarcoGorelli closed 8 months ago

MarcoGorelli commented 9 months ago

Describe the bug

In pandas I can do the following:

In [12]: df = pd.DataFrame({'a': [1,2,3], 'b': [4,4,5], 'c': [7,8,9]})

In [13]: df.groupby('b', as_index=False).apply(lambda df: pd.Series([df['a'].sum(), df['c'].mean()], index=['a_sum', 'c_
    ...: mean']), include_groups=False)
Out[13]:
   b  a_sum  c_mean
0  4    3.0     7.5
1  5    3.0     9.0

Steps/Code to reproduce bug

In cudf, however:

df = cudf.DataFrame({'a': [1,2,3], 'b': [4,4,5], 'c': [7,8,9]})
df.groupby('b', as_index=False).apply(lambda df: cudf.Series([df['a'].sum(), df['c'].mean()], index=['a_sum', 'c_mean']))

TypeError                                 Traceback (most recent call last)
[<ipython-input-15-41e17871a9cc>](https://localhost:8080/#) in <cell line: 1>()
----> 1 df.groupby('b', as_index=False).apply(lambda df: cudf.Series([df['a'].sum(), df['c'].mean()], index=['a_sum', 'c_mean']))

4 frames
[/usr/local/lib/python3.10/dist-packages/cudf/core/groupby/groupby.py](https://localhost:8080/#) in _post_process_chunk_results(self, chunk_results, group_names, group_keys, grouped_values)
   1312                 result.index = cudf.MultiIndex._from_data(index_data)
   1313             else:
-> 1314                 raise TypeError(
   1315                     "Error handling Groupby apply output with input of "
   1316                     f"type {type(self.obj)} and output of "

TypeError: Error handling Groupby apply output with input of type <class 'cudf.core.dataframe.DataFrame'> and output of type <class 'cudf.core.series.Series'>

Expected behavior same output as pandas

Environment overview (please complete the following information)

Environment details Please run and paste the output of the cudf/print_env.sh script here, to gather any other relevant environment details

Click here to see environment details

     **git***
     Not inside a git repository

     ***OS Information***
     DISTRIB_ID=Ubuntu
     DISTRIB_RELEASE=22.04
     DISTRIB_CODENAME=jammy
     DISTRIB_DESCRIPTION="Ubuntu 22.04.3 LTS"
     PRETTY_NAME="Ubuntu 22.04.3 LTS"
     NAME="Ubuntu"
     VERSION_ID="22.04"
     VERSION="22.04.3 LTS (Jammy Jellyfish)"
     VERSION_CODENAME=jammy
     ID=ubuntu
     ID_LIKE=debian
     HOME_URL="https://www.ubuntu.com/"
     SUPPORT_URL="https://help.ubuntu.com/"
     BUG_REPORT_URL="https://bugs.launchpad.net/ubuntu/"
     PRIVACY_POLICY_URL="https://www.ubuntu.com/legal/terms-and-policies/privacy-policy"
     UBUNTU_CODENAME=jammy
     Linux 805bca70e0af 6.1.58+ #1 SMP PREEMPT_DYNAMIC Sat Nov 18 15:31:17 UTC 2023 x86_64 x86_64 x86_64 GNU/Linux

     ***GPU Information***
     Sun Feb 18 14:29:00 2024
     +---------------------------------------------------------------------------------------+
     | NVIDIA-SMI 535.104.05             Driver Version: 535.104.05   CUDA Version: 12.2     |
     |-----------------------------------------+----------------------+----------------------+
     | GPU  Name                 Persistence-M | Bus-Id        Disp.A | Volatile Uncorr. ECC |
     | Fan  Temp   Perf          Pwr:Usage/Cap |         Memory-Usage | GPU-Util  Compute M. |
     |                                         |                      |               MIG M. |
     |=========================================+======================+======================|
     |   0  Tesla T4                       Off | 00000000:00:04.0 Off |                    0 |
     | N/A   64C    P0              30W /  70W |    117MiB / 15360MiB |      0%      Default |
     |                                         |                      |                  N/A |
     +-----------------------------------------+----------------------+----------------------+

     +---------------------------------------------------------------------------------------+
     | Processes:                                                                            |
     |  GPU   GI   CI        PID   Type   Process name                            GPU Memory |
     |        ID   ID                                                             Usage      |
     |=======================================================================================|
     +---------------------------------------------------------------------------------------+

     ***CPU***
     Architecture:                       x86_64
     CPU op-mode(s):                     32-bit, 64-bit
     Address sizes:                      46 bits physical, 48 bits virtual
     Byte Order:                         Little Endian
     CPU(s):                             2
     On-line CPU(s) list:                0,1
     Vendor ID:                          GenuineIntel
     Model name:                         Intel(R) Xeon(R) CPU @ 2.20GHz
     CPU family:                         6
     Model:                              79
     Thread(s) per core:                 2
     Core(s) per socket:                 1
     Socket(s):                          1
     Stepping:                           0
     BogoMIPS:                           4399.99
     Flags:                              fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ss ht syscall nx pdpe1gb rdtscp lm constant_tsc rep_good nopl xtopology nonstop_tsc cpuid tsc_known_freq pni pclmulqdq ssse3 fma cx16 pcid sse4_1 sse4_2 x2apic movbe popcnt aes xsave avx f16c rdrand hypervisor lahf_lm abm 3dnowprefetch invpcid_single ssbd ibrs ibpb stibp fsgsbase tsc_adjust bmi1 hle avx2 smep bmi2 erms invpcid rtm rdseed adx smap xsaveopt arat md_clear arch_capabilities
     Hypervisor vendor:                  KVM
     Virtualization type:                full
     L1d cache:                          32 KiB (1 instance)
     L1i cache:                          32 KiB (1 instance)
     L2 cache:                           256 KiB (1 instance)
     L3 cache:                           55 MiB (1 instance)
     NUMA node(s):                       1
     NUMA node0 CPU(s):                  0,1
     Vulnerability Gather data sampling: Not affected
     Vulnerability Itlb multihit:        Not affected
     Vulnerability L1tf:                 Mitigation; PTE Inversion
     Vulnerability Mds:                  Vulnerable; SMT Host state unknown
     Vulnerability Meltdown:             Vulnerable
     Vulnerability Mmio stale data:      Vulnerable
     Vulnerability Retbleed:             Vulnerable
     Vulnerability Spec rstack overflow: Not affected
     Vulnerability Spec store bypass:    Vulnerable
     Vulnerability Spectre v1:           Vulnerable: __user pointer sanitization and usercopy barriers only; no swapgs barriers
     Vulnerability Spectre v2:           Vulnerable, IBPB: disabled, STIBP: disabled, PBRSB-eIBRS: Not affected
     Vulnerability Srbds:                Not affected
     Vulnerability Tsx async abort:      Vulnerable

     ***CMake***
     /usr/local/bin/cmake
     cmake version 3.27.9

     CMake suite maintained and supported by Kitware (kitware.com/cmake).

     ***g++***
     /usr/bin/g++
     g++ (Ubuntu 11.4.0-1ubuntu1~22.04) 11.4.0
     Copyright (C) 2021 Free Software Foundation, Inc.
     This is free software; see the source for copying conditions.  There is NO
     warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.

     ***nvcc***
     /usr/local/cuda/bin/nvcc
     nvcc: NVIDIA (R) Cuda compiler driver
     Copyright (c) 2005-2023 NVIDIA Corporation
     Built on Tue_Aug_15_22:02:13_PDT_2023
     Cuda compilation tools, release 12.2, V12.2.140
     Build cuda_12.2.r12.2/compiler.33191640_0

     ***Python***
     /usr/local/bin/python
     Python 3.10.12

     ***Environment Variables***
     PATH                            : /opt/bin:/usr/local/nvidia/bin:/usr/local/cuda/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/tools/node/bin:/tools/google-cloud-sdk/bin
     LD_LIBRARY_PATH                 : /usr/lib64-nvidia
     NUMBAPRO_NVVM                   :
     NUMBAPRO_LIBDEVICE              :
     CONDA_PREFIX                    :
     PYTHON_PATH                     :

     conda not found
     ***pip packages***
     /usr/local/bin/pip
     Package                          Version
     -------------------------------- ---------------------
     absl-py                          1.4.0
     aiohttp                          3.9.3
     aiosignal                        1.3.1
     alabaster                        0.7.16
     albumentations                   1.3.1
     altair                           4.2.2
     annotated-types                  0.6.0
     anyio                            3.7.1
     appdirs                          1.4.4
     argon2-cffi                      23.1.0
     argon2-cffi-bindings             21.2.0
     array-record                     0.5.0
     arviz                            0.15.1
     astropy                          5.3.4
     astunparse                       1.6.3
     async-timeout                    4.0.3
     atpublic                         4.0
     attrs                            23.2.0
     audioread                        3.0.1
     autograd                         1.6.2
     Babel                            2.14.0
     backcall                         0.2.0
     beautifulsoup4                   4.12.3
     bidict                           0.22.1
     bigframes                        0.20.1
     bleach                           6.1.0
     blinker                          1.4
     blis                             0.7.11
     blosc2                           2.0.0
     bokeh                            3.3.4
     bqplot                           0.12.42
     branca                           0.7.1
     build                            1.0.3
     CacheControl                     0.14.0
     cachetools                       5.3.2
     catalogue                        2.0.10
     certifi                          2024.2.2
     cffi                             1.16.0
     chardet                          5.2.0
     charset-normalizer               3.3.2
     chex                             0.1.85
     click                            8.1.7
     click-plugins                    1.1.1
     cligj                            0.7.2
     cloudpathlib                     0.16.0
     cloudpickle                      2.2.1
     cmake                            3.27.9
     cmdstanpy                        1.2.1
     colorcet                         3.0.1
     colorlover                       0.3.0
     colour                           0.1.5
     community                        1.0.0b1
     confection                       0.1.4
     cons                             0.4.6
     contextlib2                      21.6.0
     contourpy                        1.2.0
     cryptography                     42.0.2
     cucim-cu12                       24.2.0
     cuda-python                      12.3.0
     cudf-cu12                        24.2.1
     cufflinks                        0.17.3
     cugraph-cu12                     24.2.0
     cuml-cu12                        24.2.0
     cuproj-cu12                      24.2.0
     cupy-cuda12x                     13.0.0
     cuspatial-cu12                   24.2.0
     cuxfilter-cu12                   24.2.0
     cvxopt                           1.3.2
     cvxpy                            1.3.3
     cycler                           0.12.1
     cymem                            2.0.8
     Cython                           3.0.8
     dask                             2024.1.1
     dask-cuda                        24.2.0
     dask-cudf-cu12                   24.2.1
     datascience                      0.17.6
     datashader                       0.16.0
     db-dtypes                        1.2.0
     dbus-python                      1.2.18
     debugpy                          1.6.6
     decorator                        4.4.2
     defusedxml                       0.7.1
     diskcache                        5.6.3
     distributed                      2024.1.1
     distro                           1.7.0
     dlib                             19.24.2
     dm-tree                          0.1.8
     docutils                         0.18.1
     dopamine-rl                      4.0.6
     duckdb                           0.9.2
     earthengine-api                  0.1.389
     easydict                         1.12
     ecos                             2.0.13
     editdistance                     0.6.2
     eerepr                           0.0.4
     en-core-web-sm                   3.7.1
     entrypoints                      0.4
     et-xmlfile                       1.1.0
     etils                            1.6.0
     etuples                          0.3.9
     exceptiongroup                   1.2.0
     fastai                           2.7.14
     fastcore                         1.5.29
     fastdownload                     0.0.7
     fastjsonschema                   2.19.1
     fastprogress                     1.0.3
     fastrlock                        0.8.2
     filelock                         3.13.1
     fiona                            1.9.5
     firebase-admin                   5.3.0
     Flask                            2.2.5
     flatbuffers                      23.5.26
     flax                             0.8.1
     folium                           0.14.0
     fonttools                        4.48.1
     frozendict                       2.4.0
     frozenlist                       1.4.1
     fsspec                           2023.6.0
     future                           0.18.3
     gast                             0.5.4
     gcsfs                            2023.6.0
     GDAL                             3.6.4
     gdown                            4.7.3
     geemap                           0.30.4
     gensim                           4.3.2
     geocoder                         1.38.1
     geographiclib                    2.0
     geopandas                        0.13.2
     geopy                            2.3.0
     gin-config                       0.5.0
     glob2                            0.7
     google                           2.0.3
     google-ai-generativelanguage     0.4.0
     google-api-core                  2.11.1
     google-api-python-client         2.84.0
     google-auth                      2.27.0
     google-auth-httplib2             0.1.1
     google-auth-oauthlib             1.2.0
     google-cloud-aiplatform          1.39.0
     google-cloud-bigquery            3.12.0
     google-cloud-bigquery-connection 1.12.1
     google-cloud-bigquery-storage    2.24.0
     google-cloud-core                2.3.3
     google-cloud-datastore           2.15.2
     google-cloud-firestore           2.11.1
     google-cloud-functions           1.13.3
     google-cloud-iam                 2.14.1
     google-cloud-language            2.9.1
     google-cloud-resource-manager    1.12.1
     google-cloud-storage             2.8.0
     google-cloud-translate           3.11.3
     google-colab                     1.0.0
     google-crc32c                    1.5.0
     google-generativeai              0.3.2
     google-pasta                     0.2.0
     google-resumable-media           2.7.0
     googleapis-common-protos         1.62.0
     googledrivedownloader            0.4
     graphviz                         0.20.1
     greenlet                         3.0.3
     grpc-google-iam-v1               0.13.0
     grpcio                           1.60.1
     grpcio-status                    1.48.2
     gspread                          3.4.2
     gspread-dataframe                3.3.1
     gym                              0.25.2
     gym-notices                      0.0.8
     h5netcdf                         1.3.0
     h5py                             3.9.0
     holidays                         0.42
     holoviews                        1.17.1
     html5lib                         1.1
     httpimport                       1.3.1
     httplib2                         0.22.0
     huggingface-hub                  0.20.3
     humanize                         4.7.0
     hyperopt                         0.2.7
     ibis-framework                   7.1.0
     idna                             3.6
     imageio                          2.31.6
     imageio-ffmpeg                   0.4.9
     imagesize                        1.4.1
     imbalanced-learn                 0.10.1
     imgaug                           0.4.0
     importlib-metadata               7.0.1
     importlib-resources              6.1.1
     imutils                          0.5.4
     inflect                          7.0.0
     iniconfig                        2.0.0
     install                          1.3.5
     intel-openmp                     2023.2.3
     ipyevents                        2.0.2
     ipyfilechooser                   0.6.0
     ipykernel                        5.5.6
     ipyleaflet                       0.18.2
     ipython                          7.34.0
     ipython-genutils                 0.2.0
     ipython-sql                      0.5.0
     ipytree                          0.2.2
     ipywidgets                       7.7.1
     itsdangerous                     2.1.2
     jax                              0.4.23
     jaxlib                           0.4.23+cuda12.cudnn89
     jeepney                          0.7.1
     jieba                            0.42.1
     Jinja2                           3.1.3
     joblib                           1.3.2
     jsonpickle                       3.0.2
     jsonschema                       4.19.2
     jsonschema-specifications        2023.12.1
     jupyter-client                   6.1.12
     jupyter-console                  6.1.0
     jupyter_core                     5.7.1
     jupyter-server                   1.24.0
     jupyter_server_proxy             4.1.0
     jupyterlab_pygments              0.3.0
     jupyterlab_widgets               3.0.10
     kaggle                           1.5.16
     kagglehub                        0.1.9
     keras                            2.15.0
     keyring                          23.5.0
     kiwisolver                       1.4.5
     langcodes                        3.3.0
     launchpadlib                     1.10.16
     lazr.restfulclient               0.14.4
     lazr.uri                         1.0.6
     lazy_loader                      0.3
     libclang                         16.0.6
     librosa                          0.10.1
     lida                             0.0.10
     lightgbm                         4.1.0
     linkify-it-py                    2.0.3
     llmx                             0.0.15a0
     llvmlite                         0.41.1
     locket                           1.0.0
     logical-unification              0.4.6
     lxml                             4.9.4
     malloy                           2023.1067
     Markdown                         3.5.2
     markdown-it-py                   3.0.0
     MarkupSafe                       2.1.5
     matplotlib                       3.7.1
     matplotlib-inline                0.1.6
     matplotlib-venn                  0.11.10
     mdit-py-plugins                  0.4.0
     mdurl                            0.1.2
     miniKanren                       1.0.3
     missingno                        0.5.2
     mistune                          0.8.4
     mizani                           0.9.3
     mkl                              2023.2.0
     ml-dtypes                        0.2.0
     mlxtend                          0.22.0
     more-itertools                   10.1.0
     moviepy                          1.0.3
     mpmath                           1.3.0
     msgpack                          1.0.7
     multidict                        6.0.5
     multipledispatch                 1.0.0
     multitasking                     0.0.11
     murmurhash                       1.0.10
     music21                          9.1.0
     natsort                          8.4.0
     nbclassic                        1.0.0
     nbclient                         0.9.0
     nbconvert                        6.5.4
     nbformat                         5.9.2
     nest-asyncio                     1.6.0
     networkx                         3.2.1
     nibabel                          4.0.2
     nltk                             3.8.1
     notebook                         6.5.5
     notebook_shim                    0.2.3
     numba                            0.58.1
     numexpr                          2.9.0
     numpy                            1.24.4
     nvtx                             0.2.10
     oauth2client                     4.1.3
     oauthlib                         3.2.2
     opencv-contrib-python            4.8.0.76
     opencv-python                    4.8.0.76
     opencv-python-headless           4.9.0.80
     openpyxl                         3.1.2
     opt-einsum                       3.3.0
     optax                            0.1.9
     orbax-checkpoint                 0.4.4
     osqp                             0.6.2.post8
     packaging                        23.2
     pandas                           1.5.3
     pandas-datareader                0.10.0
     pandas-gbq                       0.19.2
     pandas-stubs                     1.5.3.230304
     pandocfilters                    1.5.1
     panel                            1.3.8
     param                            2.0.2
     parso                            0.8.3
     parsy                            2.1
     partd                            1.4.1
     pathlib                          1.0.1
     patsy                            0.5.6
     peewee                           3.17.1
     pexpect                          4.9.0
     pickleshare                      0.7.5
     Pillow                           9.4.0
     pins                             0.8.4
     pip                              23.1.2
     pip-tools                        6.13.0
     platformdirs                     4.2.0
     plotly                           5.15.0
     plotnine                         0.12.4
     pluggy                           1.4.0
     polars                           0.20.2
     polars_api_compat                0.1.0
     pooch                            1.8.0
     portpicker                       1.5.2
     prefetch-generator               1.0.3
     preshed                          3.0.9
     prettytable                      3.9.0
     proglog                          0.1.10
     progressbar2                     4.2.0
     prometheus-client                0.19.0
     promise                          2.3
     prompt-toolkit                   3.0.43
     prophet                          1.1.5
     proto-plus                       1.23.0
     protobuf                         4.25.3
     psutil                           5.9.5
     psycopg2                         2.9.9
     ptyprocess                       0.7.0
     py-cpuinfo                       9.0.0
     py4j                             0.10.9.7
     pyarrow                          14.0.2
     pyarrow-hotfix                   0.6
     pyasn1                           0.5.1
     pyasn1-modules                   0.3.0
     pycocotools                      2.0.7
     pycparser                        2.21
     pyct                             0.5.0
     pydantic                         2.6.1
     pydantic_core                    2.16.2
     pydata-google-auth               1.8.2
     pydot                            1.4.2
     pydot-ng                         2.0.0
     pydotplus                        2.0.2
     PyDrive                          1.3.1
     PyDrive2                         1.6.3
     pyerfa                           2.0.1.1
     pygame                           2.5.2
     Pygments                         2.16.1
     PyGObject                        3.42.1
     PyJWT                            2.3.0
     pylibcugraph-cu12                24.2.0
     pylibraft-cu12                   24.2.0
     pymc                             5.7.2
     pymystem3                        0.2.0
     pynvjitlink-cu12                 0.1.12
     pynvml                           11.4.1
     PyOpenGL                         3.1.7
     pyOpenSSL                        24.0.0
     pyparsing                        3.1.1
     pyperclip                        1.8.2
     pyproj                           3.6.1
     pyproject_hooks                  1.0.0
     pyshp                            2.3.1
     PySocks                          1.7.1
     pytensor                         2.14.2
     pytest                           7.4.4
     python-apt                       0.0.0
     python-box                       7.1.1
     python-dateutil                  2.8.2
     python-louvain                   0.16
     python-slugify                   8.0.4
     python-utils                     3.8.2
     pytz                             2023.4
     pyviz_comms                      3.0.1
     PyWavelets                       1.5.0
     PyYAML                           6.0.1
     pyzmq                            23.2.1
     qdldl                            0.1.7.post0
     qudida                           0.0.4
     raft-dask-cu12                   24.2.0
     rapids-dask-dependency           24.2.0
     ratelim                          0.1.6
     referencing                      0.33.0
     regex                            2023.12.25
     requests                         2.31.0
     requests-oauthlib                1.3.1
     requirements-parser              0.5.0
     rich                             13.7.0
     rmm-cu12                         24.2.0
     rpds-py                          0.17.1
     rpy2                             3.4.2
     rsa                              4.9
     safetensors                      0.4.2
     scikit-image                     0.19.3
     scikit-learn                     1.2.2
     scipy                            1.11.4
     scooby                           0.9.2
     scs                              3.2.4.post1
     seaborn                          0.13.1
     SecretStorage                    3.3.1
     Send2Trash                       1.8.2
     sentencepiece                    0.1.99
     setuptools                       67.7.2
     shapely                          2.0.2
     simpervisor                      1.0.0
     six                              1.16.0
     sklearn-pandas                   2.2.0
     smart-open                       6.4.0
     sniffio                          1.3.0
     snowballstemmer                  2.2.0
     sortedcontainers                 2.4.0
     soundfile                        0.12.1
     soupsieve                        2.5
     soxr                             0.3.7
     spacy                            3.7.2
     spacy-legacy                     3.0.12
     spacy-loggers                    1.0.5
     Sphinx                           5.0.2
     sphinxcontrib-applehelp          1.0.8
     sphinxcontrib-devhelp            1.0.6
     sphinxcontrib-htmlhelp           2.0.5
     sphinxcontrib-jsmath             1.0.1
     sphinxcontrib-qthelp             1.0.7
     sphinxcontrib-serializinghtml    1.1.10
     SQLAlchemy                       2.0.27
     sqlglot                          19.9.0
     sqlparse                         0.4.4
     srsly                            2.4.8
     stanio                           0.3.0
     statsmodels                      0.14.1
     sympy                            1.12
     tables                           3.8.0
     tabulate                         0.9.0
     tbb                              2021.11.0
     tblib                            3.0.0
     tenacity                         8.2.3
     tensorboard                      2.15.2
     tensorboard-data-server          0.7.2
     tensorflow                       2.15.0
     tensorflow-datasets              4.9.4
     tensorflow-estimator             2.15.0
     tensorflow-gcs-config            2.15.0
     tensorflow-hub                   0.16.1
     tensorflow-io-gcs-filesystem     0.36.0
     tensorflow-metadata              1.14.0
     tensorflow-probability           0.23.0
     tensorstore                      0.1.45
     termcolor                        2.4.0
     terminado                        0.18.0
     text-unidecode                   1.3
     textblob                         0.17.1
     tf-keras                         2.15.0
     tf-slim                          1.1.0
     thinc                            8.2.3
     threadpoolctl                    3.2.0
     tifffile                         2024.2.12
     tinycss2                         1.2.1
     tokenizers                       0.15.2
     toml                             0.10.2
     tomli                            2.0.1
     toolz                            0.12.1
     torch                            2.1.0+cu121
     torchaudio                       2.1.0+cu121
     torchdata                        0.7.0
     torchsummary                     1.5.1
     torchtext                        0.16.0
     torchvision                      0.16.0+cu121
     tornado                          6.3.2
     tqdm                             4.66.2
     traitlets                        5.7.1
     traittypes                       0.2.1
     transformers                     4.35.2
     treelite                         4.0.0
     triton                           2.1.0
     tweepy                           4.14.0
     typer                            0.9.0
     types-pytz                       2024.1.0.20240203
     types-setuptools                 69.0.0.20240125
     typing_extensions                4.9.0
     tzlocal                          5.2
     uc-micro-py                      1.0.3
     ucx-py-cu12                      0.36.0
     uritemplate                      4.1.1
     urllib3                          2.0.7
     vega-datasets                    0.9.0
     wadllib                          1.3.6
     wasabi                           1.1.2
     wcwidth                          0.2.13
     weasel                           0.3.4
     webcolors                        1.13
     webencodings                     0.5.1
     websocket-client                 1.7.0
     Werkzeug                         3.0.1
     wheel                            0.42.0
     widgetsnbextension               3.6.6
     wordcloud                        1.9.3
     wrapt                            1.14.1
     xarray                           2023.7.0
     xarray-einstats                  0.7.0
     xgboost                          2.0.3
     xlrd                             2.0.1
     xxhash                           3.4.1
     xyzservices                      2023.10.1
     yarl                             1.9.4
     yellowbrick                      1.5
     yfinance                         0.2.36
     zict                             3.0.0
     zipp                             3.17.0

Additional context Add any other context about the problem here.

shwina commented 9 months ago

Thanks for reporting, Marco! A couple of side notes here - both of which I think you may be aware but I'll include here for posterity:

  1. The same thing can be achieved with:
In [14]: df.groupby('b').agg({'a': 'sum', 'c': 'mean'})
Out[14]:
   a    c
b
4  3  7.5
5  3  9.0
  1. .apply() is going to be much slower than agg() in the general case. It works by iterating over the groups and applying the UDF to each group. In certain cases, we are able to JIT compile the input function to .apply()and it can be very fast - see https://docs.rapids.ai/api/cudf/stable/user_guide/guide-to-udfs/#overview-of-user-defined-functions-with-cudf.
brandon-b-miller commented 9 months ago

Hi @MarcoGorelli, thanks for reporting. This case wasn't handled by cuDF's apply post-processing machinery. I've opened a PR that should fix the problem.