[BUG] dask_cudf, groupby mean is numerically instable

Describe the bug dask_cudf groupby mean is numerically instable

Steps/Code to reproduce bug

import numpy as np
import cudf
from dask_cuda import LocalCUDACluster
cluster = LocalCUDACluster()
from dask.distributed import Client
client = Client(cluster)

for i in range(100):
    np.random.seed(3)
    size = 100
    groups = 20
    df = cudf.DataFrame()
    df['asset'] = np.random.randint(1, groups, size)
    df['num'] = np.random.rand(size)
    cdf = dask_cudf.from_cudf(df, npartitions=16)
    gt = df.groupby('asset').mean()
    dm = cdf.groupby('asset').mean().compute().reset_index().sort_values('index')
    print('trail', i, (gt['num']-dm['num']).abs().max())

Expected behavior The distributed groupby vs non-distributed groupby mean should be the same, and stable (independent of the trail number). But the above will produce different numbers randomly.

Environment overview (please complete the following information) DGX-1 machine

Environment details

Click here to see environment details


     **git***
     Not inside a git repository

     ***OS Information***
     DISTRIB_ID=Ubuntu
     DISTRIB_RELEASE=18.04
     DISTRIB_CODENAME=bionic
     DISTRIB_DESCRIPTION="Ubuntu 18.04.2 LTS"
     NAME="Ubuntu"
     VERSION="18.04.2 LTS (Bionic Beaver)"
     ID=ubuntu
     ID_LIKE=debian
     PRETTY_NAME="Ubuntu 18.04.2 LTS"
     VERSION_ID="18.04"
     HOME_URL="https://www.ubuntu.com/"
     SUPPORT_URL="https://help.ubuntu.com/"
     BUG_REPORT_URL="https://bugs.launchpad.net/ubuntu/"
     PRIVACY_POLICY_URL="https://www.ubuntu.com/legal/terms-and-policies/privacy-policy"
     VERSION_CODENAME=bionic
     UBUNTU_CODENAME=bionic
     Linux 1807b463c94e 4.4.0-130-generic #156-Ubuntu SMP Thu Jun 14 08:53:28 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux

     ***GPU Information***
     Mon Aug 12 22:09:42 2019
     +-----------------------------------------------------------------------------+
     | NVIDIA-SMI 418.67       Driver Version: 418.67       CUDA Version: 10.1     |
     |-------------------------------+----------------------+----------------------+
     | GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
     | Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
     |===============================+======================+======================|
     |   0  Tesla V100-SXM2...  On   | 00000000:06:00.0 Off |                    0 |
     | N/A   29C    P0    41W / 300W |      0MiB / 16130MiB |      0%      Default |
     +-------------------------------+----------------------+----------------------+
     |   1  Tesla V100-SXM2...  On   | 00000000:07:00.0 Off |                    0 |
     | N/A   29C    P0    43W / 300W |      0MiB / 16130MiB |      0%      Default |
     +-------------------------------+----------------------+----------------------+
     |   2  Tesla V100-SXM2...  On   | 00000000:0A:00.0 Off |                    0 |
     | N/A   30C    P0    43W / 300W |      0MiB / 16130MiB |      0%      Default |
     +-------------------------------+----------------------+----------------------+
     |   3  Tesla V100-SXM2...  On   | 00000000:0B:00.0 Off |                    0 |
     | N/A   27C    P0    41W / 300W |      0MiB / 16130MiB |      0%      Default |
     +-------------------------------+----------------------+----------------------+
     |   4  Tesla V100-SXM2...  On   | 00000000:85:00.0 Off |                    0 |
     | N/A   30C    P0    43W / 300W |      0MiB / 16130MiB |      0%      Default |
     +-------------------------------+----------------------+----------------------+
     |   5  Tesla V100-SXM2...  On   | 00000000:86:00.0 Off |                    0 |
     | N/A   29C    P0    41W / 300W |      0MiB / 16130MiB |      0%      Default |
     +-------------------------------+----------------------+----------------------+
     |   6  Tesla V100-SXM2...  On   | 00000000:89:00.0 Off |                    0 |
     | N/A   31C    P0    43W / 300W |      0MiB / 16130MiB |      0%      Default |
     +-------------------------------+----------------------+----------------------+
     |   7  Tesla V100-SXM2...  On   | 00000000:8A:00.0 Off |                    0 |
     | N/A   29C    P0    41W / 300W |      0MiB / 16130MiB |      0%      Default |
     +-------------------------------+----------------------+----------------------+

     +-----------------------------------------------------------------------------+
     | Processes:                                                       GPU Memory |
     |  GPU       PID   Type   Process name                             Usage      |
     |=============================================================================|
     |  No running processes found                                                 |
     +-----------------------------------------------------------------------------+

     ***CPU***
     Architecture:        x86_64
     CPU op-mode(s):      32-bit, 64-bit
     Byte Order:          Little Endian
     CPU(s):              80
     On-line CPU(s) list: 0-79
     Thread(s) per core:  2
     Core(s) per socket:  20
     Socket(s):           2
     NUMA node(s):        2
     Vendor ID:           GenuineIntel
     CPU family:          6
     Model:               79
     Model name:          Intel(R) Xeon(R) CPU E5-2698 v4 @ 2.20GHz
     Stepping:            1
     CPU MHz:             2497.000
     CPU max MHz:         3600.0000
     CPU min MHz:         1200.0000
     BogoMIPS:            4389.87
     Virtualization:      VT-x
     L1d cache:           32K
     L1i cache:           32K
     L2 cache:            256K
     L3 cache:            51200K
     NUMA node0 CPU(s):   0-19,40-59
     NUMA node1 CPU(s):   20-39,60-79
     Flags:               fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx pdpe1gb rdtscp lm constant_tsc arch_perfmon pebs bts rep_good nopl xtopology nonstop_tsc aperfmperf eagerfpu pni pclmulqdq dtes64 monitor ds_cpl vmx smx est tm2 ssse3 sdbg fma cx16 xtpr pdcm pcid dca sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand lahf_lm abm 3dnowprefetch epb invpcid_single intel_pt ibrs ibpb stibp kaiser tpr_shadow vnmi flexpriority ept vpid fsgsbase tsc_adjust bmi1 hle avx2 smep bmi2 erms invpcid rtm cqm rdseed adx smap xsaveopt cqm_llc cqm_occup_llc cqm_mbm_total cqm_mbm_local dtherm ida arat pln pts

     ***CMake***

     ***g++***
     /usr/bin/g++
     g++ (Ubuntu 7.4.0-1ubuntu1~18.04.1) 7.4.0
     Copyright (C) 2017 Free Software Foundation, Inc.
     This is free software; see the source for copying conditions.  There is NO
     warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.

     ***nvcc***
     /usr/local/cuda/bin/nvcc
     nvcc: NVIDIA (R) Cuda compiler driver
     Copyright (c) 2005-2018 NVIDIA Corporation
     Built on Sat_Aug_25_21:08:01_CDT_2018
     Cuda compilation tools, release 10.0, V10.0.130

     ***Python***
     /conda/envs/rapids/bin/python
     Python 3.6.8

     ***Environment Variables***
     PATH                            : /conda/envs/rapids/bin:/conda/condabin:/usr/local/nvidia/bin:/usr/local/cuda/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/conda/bin
     LD_LIBRARY_PATH                 : /usr/local/nvidia/lib:/usr/local/nvidia/lib64
     NUMBAPRO_NVVM                   :
     NUMBAPRO_LIBDEVICE              :
     CONDA_PREFIX                    : /conda/envs/rapids
     PYTHON_PATH                     :

     ***conda packages***
     /conda/condabin/conda
     # packages in environment at /conda/envs/rapids:
     #
     # Name                    Version                   Build  Channel
     _libgcc_mutex             0.1                        main
     aiohttp                   3.5.4                    pypi_0    pypi
     alabaster                 0.7.12                   pypi_0    pypi
     arrow-cpp                 0.12.1           py36h0e61e49_0    conda-forge
     async-timeout             3.0.1                    pypi_0    pypi
     atomicwrites              1.3.0                      py_0    conda-forge
     attrs                     19.1.0                     py_0    conda-forge
     babel                     2.7.0                    pypi_0    pypi
     backcall                  0.1.0                      py_0    conda-forge
     bleach                    3.1.0                      py_0    conda-forge
     blosc                     1.17.0               he1b5a44_0    conda-forge
     bokeh                     1.2.0                    py36_0    conda-forge
     boost-cpp                 1.68.0            h11c811c_1000    conda-forge
     bqplot                    0.11.5                     py_0    conda-forge
     bzip2                     1.0.6             h14c3975_1002    conda-forge
     ca-certificates           2019.6.16            hecc5488_0    conda-forge
     cairo                     1.16.0            h18b612c_1001    conda-forge
     certifi                   2019.6.16                py36_1    conda-forge
     cffi                      1.12.3           py36h8022711_0    conda-forge
     chardet                   3.0.4                    pypi_0    pypi
     click                     7.0                        py_0    conda-forge
     cloudpickle               1.2.1                      py_0    conda-forge
     commonmark                0.9.0                    pypi_0    pypi
     cudatoolkit               10.0.130                      0
     cudf                      0.8.0                    py36_0    rapidsai/label/cuda10.0
     cugraph                   0.8.1                    py36_0    rapidsai/label/cuda10.0
     cuml                      0.8.0           cuda10.0_py36_0    rapidsai/label/cuda10.0
     cupy-cuda100              6.2.0                    pypi_0    pypi
     cycler                    0.10.0                     py_1    conda-forge
     cython                    0.29.10          py36he1b5a44_0    conda-forge
     cytoolz                   0.9.0.1         py36h14c3975_1001    conda-forge
     dask                      2.0.0                      py_0    conda-forge
     dask-core                 2.0.0                      py_0    conda-forge
     dask-cuda                 0.8.0                    py36_0    rapidsai/label/cuda10.0
     dask-cudf                 0.8.0                    py36_0    rapidsai/label/cuda10.0
     dask-cuml                 0.8.0                    py36_0    rapidsai/label/cuda10.0
     dask-labextension         1.0.3                    pypi_0    pypi
     dask-xgboost              0.2.0.dev28      cuda10.0py37_0    rapidsai/label/xgboost
     dbus                      1.13.6               he372182_0    conda-forge
     decorator                 4.4.0                      py_0    conda-forge
     defusedxml                0.5.0                      py_1    conda-forge
     distributed               2.0.1+1.gf8af742          pypi_0    pypi
     docutils                  0.15.2                   pypi_0    pypi
     entrypoints               0.3                   py36_1000    conda-forge
     expat                     2.2.5             he1b5a44_1003    conda-forge
     fastrlock                 0.4                      pypi_0    pypi
     fontconfig                2.13.1            he4413a7_1000    conda-forge
     freetype                  2.10.0               he983fc9_0    conda-forge
     fribidi                   1.0.5             h516909a_1002    conda-forge
     future                    0.17.1                   pypi_0    pypi
     gettext                   0.19.8.1          hc5be6a0_1002    conda-forge
     glib                      2.58.3            h6f030ca_1001    conda-forge
     graphite2                 1.3.13            hf484d3e_1000    conda-forge
     graphviz                  2.40.1               h5933667_1    conda-forge
     gst-plugins-base          1.14.5               h0935bb2_0    conda-forge
     gstreamer                 1.14.5               h36ae1b5_0    conda-forge
     harfbuzz                  2.4.0                h37c48d4_1    conda-forge
     hdf5                      1.10.5          nompi_h3c11f04_1100    conda-forge
     heapdict                  1.0.0                 py36_1000    conda-forge
     icu                       58.2              hf484d3e_1000    conda-forge
     idna                      2.8                      pypi_0    pypi
     idna-ssl                  1.1.0                    pypi_0    pypi
     imagesize                 1.1.0                    pypi_0    pypi
     importlib_metadata        0.18                     py36_0    conda-forge
     intel-openmp              2019.4                      243
     ipykernel                 5.1.1            py36h24bf2e0_0    conda-forge
     ipython                   7.3.0            py36h24bf2e0_0    conda-forge
     ipython_genutils          0.2.0                      py_1    conda-forge
     ipywidgets                7.4.2                      py_0    conda-forge
     jedi                      0.14.0                   py36_0    conda-forge
     jinja2                    2.10.1                     py_0    conda-forge
     joblib                    0.13.2                     py_0    conda-forge
     jpeg                      9c                h14c3975_1001    conda-forge
     json5                     0.8.4                      py_0    conda-forge
     jsonschema                3.0.1                    py36_0    conda-forge
     jupyter-server-proxy      1.1.0                    pypi_0    pypi
     jupyter_client            5.2.4                      py_3    conda-forge
     jupyter_core              4.4.0                      py_0    conda-forge
     jupyterlab                0.35.4                   py36_0    conda-forge
     jupyterlab_server         0.2.0                      py_0    conda-forge
     kiwisolver                1.1.0            py36hc9558a2_0    conda-forge
     libblas                   3.8.0                7_openblas    conda-forge
     libcblas                  3.8.0                7_openblas    conda-forge
     libcudf                   0.8.0                cuda10.0_0    rapidsai/label/cuda10.0
     libcugraph                0.8.1                cuda10.0_0    rapidsai/label/cuda10.0
     libcuml                   0.8.0                cuda10.0_0    rapidsai/label/cuda10.0
     libcumlmg                 0.0.0.dev0         cuda10.0_373    nvidia/label/cuda10.0
     libedit                   3.1.20181209         hc058e9b_0
     libffi                    3.2.1                hd88cf55_4
     libgcc-ng                 9.1.0                hdf63c60_0
     libgfortran-ng            7.3.0                hdf63c60_0
     libiconv                  1.15              h516909a_1005    conda-forge
     liblapack                 3.8.0                7_openblas    conda-forge
     libnvstrings              0.8.0                cuda10.0_0    rapidsai/label/cuda10.0
     libpng                    1.6.37               hed695b0_0    conda-forge
     libprotobuf               3.6.1             hdbcaa40_1001    conda-forge
     librmm                    0.8.0                cuda10.0_0    rapidsai/label/cuda10.0
     libsodium                 1.0.16            h14c3975_1001    conda-forge
     libstdcxx-ng              9.1.0                hdf63c60_0
     libtiff                   4.0.10            h57b8799_1003    conda-forge
     libtool                   2.4.6             h14c3975_1002    conda-forge
     libuuid                   2.32.1            h14c3975_1000    conda-forge
     libxcb                    1.13              h14c3975_1002    conda-forge
     libxgboost                0.90.rapidsdev1      cuda10.0_1    rapidsai/label/xgboost
     libxml2                   2.9.9                h13577e0_0    conda-forge
     llvmlite                  0.27.0dev0      py36hf484d3e_19    numba
     locket                    0.2.0                      py_2    conda-forge
     lz4-c                     1.8.3             he1b5a44_1001    conda-forge
     lzo                       2.10              h14c3975_1000    conda-forge
     markupsafe                1.1.1            py36h14c3975_0    conda-forge
     matplotlib                3.1.0            py36h5429711_0
     mistune                   0.8.4           py36h14c3975_1000    conda-forge
     mkl                       2019.4                      243
     mock                      3.0.5                    py36_0    conda-forge
     more-itertools            4.3.0                 py36_1000    conda-forge
     msgpack-python            0.6.1            py36h6bb024c_0    conda-forge
     multidict                 4.5.2                    pypi_0    pypi
     nbconvert                 5.5.0                      py_0    conda-forge
     nbformat                  4.4.0                      py_1    conda-forge
     nccl                      2.4.6.1              cuda10.0_0    nvidia
     ncurses                   6.1                  he6710b0_1
     networkx                  2.3                        py_0    conda-forge
     nodejs                    11.11.0              hf484d3e_0    conda-forge
     notebook                  5.7.8                    py36_1    conda-forge
     numba                     0.41.0          py36h637b7d7_1000    conda-forge
     numexpr                   2.6.9           py36h637b7d7_1000    conda-forge
     numpy                     1.16.2           py36h8b7e671_1    conda-forge
     numpydoc                  0.9.1                    pypi_0    pypi
     nvstrings                 0.8.0                    py36_0    rapidsai/label/cuda10.0
     nxpd                      0.2.0                    pypi_0    pypi
     olefile                   0.46                       py_0    conda-forge
     openblas                  0.3.5             h9ac9557_1001    conda-forge
     openssl                   1.1.1c               h516909a_0    conda-forge
     packaging                 19.0                       py_0    conda-forge
     pandas                    0.23.4          py36h637b7d7_1000    conda-forge
     pandoc                    2.7.3                         0    conda-forge
     pandocfilters             1.4.2                      py_1    conda-forge
     pango                     1.42.4               he7ab937_0    conda-forge
     parquet-cpp               1.5.1                         4    conda-forge
     parso                     0.5.0                      py_0    conda-forge
     partd                     1.0.0                      py_0    conda-forge
     patsy                     0.5.1                      py_0    conda-forge
     pcre                      8.41              hf484d3e_1003    conda-forge
     pexpect                   4.7.0                    py36_0    conda-forge
     pickleshare               0.7.5                 py36_1000    conda-forge
     pillow                    5.3.0           py36h00a061d_1000    conda-forge
     pip                       19.1.1                   py36_0
     pixman                    0.38.0            h516909a_1003    conda-forge
     pluggy                    0.12.0                     py_0    conda-forge
     prometheus_client         0.7.1                      py_0    conda-forge
     prompt_toolkit            2.0.9                      py_0    conda-forge
     psutil                    5.6.3            py36h516909a_0    conda-forge
     pthread-stubs             0.4               h14c3975_1001    conda-forge
     ptyprocess                0.6.0                   py_1001    conda-forge
     pudb                      2019.1                   pypi_0    pypi
     py                        1.8.0                      py_0    conda-forge
     py-xgboost                0.90.rapidsdev1  cuda10.0py36_1    rapidsai/label/xgboost
     pyarrow                   0.12.1           py36hbbcf98d_0    conda-forge
     pycparser                 2.19                     py36_1    conda-forge
     pygments                  2.4.2                      py_0    conda-forge
     pyparsing                 2.4.0                      py_0    conda-forge
     pyqt                      5.9.2            py36hcca6a23_0    conda-forge
     pyrsistent                0.15.2           py36h516909a_0    conda-forge
     pytables                  3.5.2            py36h9f153d1_1    conda-forge
     pytest                    4.6.3                    py36_0    conda-forge
     python                    3.6.8                h0371630_0
     python-dateutil           2.8.0                      py_0    conda-forge
     python-graphviz           0.11.1                     py_1    conda-forge
     pytz                      2019.1                     py_0    conda-forge
     pyyaml                    5.1.1            py36h516909a_0    conda-forge
     pyzmq                     18.0.2           py36hc4ba49a_0    conda-forge
     qt                        5.9.7                h52cfd70_2    conda-forge
     readline                  7.0                  h7b6447c_5
     recommonmark              0.5.0                    pypi_0    pypi
     requests                  2.22.0                   pypi_0    pypi
     rmm                       0.8.0                    py36_0    rapidsai/label/cuda10.0
     scikit-learn              0.21.2           py36hcdab131_1    conda-forge
     scipy                     1.3.0            py36h921218d_0    conda-forge
     seaborn                   0.9.0                      py_0    conda-forge
     send2trash                1.5.0                      py_0    conda-forge
     setuptools                41.0.1                   py36_0
     simpervisor               0.3                      pypi_0    pypi
     sip                       4.19.8          py36hf484d3e_1000    conda-forge
     six                       1.12.0                py36_1000    conda-forge
     snowballstemmer           1.9.0                    pypi_0    pypi
     sortedcontainers          2.1.0                      py_0    conda-forge
     sphinx                    2.1.2                    pypi_0    pypi
     sphinx-rtd-theme          0.4.3                    pypi_0    pypi
     sphinxcontrib-applehelp   1.0.1                    pypi_0    pypi
     sphinxcontrib-devhelp     1.0.1                    pypi_0    pypi
     sphinxcontrib-htmlhelp    1.0.2                    pypi_0    pypi
     sphinxcontrib-jsmath      1.0.1                    pypi_0    pypi
     sphinxcontrib-qthelp      1.0.2                    pypi_0    pypi
     sphinxcontrib-serializinghtml 1.1.3                    pypi_0    pypi
     sqlite                    3.28.0               h7b6447c_0
     statsmodels               0.10.0           py36hc1659b7_0    conda-forge
     tblib                     1.4.0                      py_0    conda-forge
     terminado                 0.8.2                    py36_0    conda-forge
     testpath                  0.4.2                   py_1001    conda-forge
     thrift-cpp                0.12.0            h0a07b25_1002    conda-forge
     tk                        8.6.8                hbc83047_0
     toolz                     0.9.0                      py_1    conda-forge
     tornado                   6.0.3            py36h516909a_0    conda-forge
     traitlets                 4.3.2                 py36_1000    conda-forge
     traittypes                0.2.1                      py_1    conda-forge
     typing-extensions         3.7.4                    pypi_0    pypi
     urllib3                   1.25.3                   pypi_0    pypi
     urwid                     2.0.1                    pypi_0    pypi
     wcwidth                   0.1.7                      py_1    conda-forge
     webencodings              0.5.1                      py_1    conda-forge
     wheel                     0.33.4                   py36_0
     widgetsnbextension        3.4.2                 py36_1000    conda-forge
     xgboost                   0.90.rapidsdev1  cuda10.0py36_1    rapidsai/label/xgboost
     xorg-kbproto              1.0.7             h14c3975_1002    conda-forge
     xorg-libice               1.0.10               h516909a_0    conda-forge
     xorg-libsm                1.2.3             h84519dc_1000    conda-forge
     xorg-libx11               1.6.8                h516909a_0    conda-forge
     xorg-libxau               1.0.9                h14c3975_0    conda-forge
     xorg-libxdmcp             1.1.3                h516909a_0    conda-forge
     xorg-libxext              1.3.4                h516909a_0    conda-forge
     xorg-libxpm               3.5.12            h14c3975_1002    conda-forge
     xorg-libxrender           0.9.10            h516909a_1002    conda-forge
     xorg-libxt                1.1.5             h516909a_1003    conda-forge
     xorg-renderproto          0.11.1            h14c3975_1002    conda-forge
     xorg-xextproto            7.3.0             h14c3975_1002    conda-forge
     xorg-xproto               7.0.31            h14c3975_1007    conda-forge
     xz                        5.2.4                h14c3975_4
     yaml                      0.1.7             h14c3975_1001    conda-forge
     yarl                      1.3.0                    pypi_0    pypi
     zeromq                    4.3.1             hf484d3e_1000    conda-forge
     zict                      1.0.0                      py_0    conda-forge
     zipp                      0.5.1                      py_0    conda-forge
     zlib                      1.2.11               h7b6447c_3
     zstd                      1.4.0                h3b9ef0a_0    conda-forge

Additional context Add any other context about the problem here.

rapidsai / cudf

[BUG] dask_cudf, groupby mean is numerically instable #2543