[BUG] Creating a DataFrame from a numpy array consumes too much RAM #10107

Closed NightMachinery closed 2 years ago

NightMachinery commented 2 years ago

Describe the bug Creating a DataFrame from a numpy array consumes too much RAM.

Steps/Code to reproduce bug

command time -f 'Max_memory: %M' python -c '
import numpy
np = numpy

n = 10**(5+4)
# a = numpy.random.default_rng().standard_normal(size=(n), dtype=np.float32)
a = np.ones((n,), dtype=np.float32)

import cudf

This will result in the following main memory (not GPU memory!) usages:

Expected behavior The numpy array itself occupies(you can comment the last two lines to see this):

The expected behavior is that the conversion should happen with some constant O(1) overhead, not an O(n) overhead.

Environment overview (please complete the following information)

Environment details

Environment details Please run and paste the output of the cudf/print_env.sh script here, to gather any other relevant environment details

<details><summary>Click here to see environment details</summary><pre>

     commit f1e0bb6a4ee766e68d93a9958688dd9ee0df7333 (HEAD -> branch-22.04, origin/branch-22.04, origin/HEAD)
     Merge: 57ff6f55b9 5a4c5f36f0
     Author: gpuCI <38199262+GPUtester@users.noreply.github.com>
     Date:   Fri Jan 21 14:26:43 2022 -0500

     Merge pull request #10106 from rapidsai/branch-22.02

     [gpuCI] Forward-merge branch-22.02 to branch-22.04 [skip gpuci]
     **git submodules***

     ***OS Information***
     DISTRIB_DESCRIPTION="Ubuntu 18.04.5 LTS"
     VERSION="18.04.5 LTS (Bionic Beaver)"
     PRETTY_NAME="Ubuntu 18.04.5 LTS"
     Linux 97bea0896dfd 5.4.144+ #1 SMP Tue Dec 7 09:58:10 PST 2021 x86_64 x86_64 x86_64 GNU/Linux

     ***GPU Information***
     Fri Jan 21 21:55:38 2022
     | NVIDIA-SMI 495.46       Driver Version: 460.32.03    CUDA Version: 11.2     |
     | GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
     | Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
     |                               |                      |               MIG M. |
     |   0  Tesla P100-PCIE...  Off  | 00000000:00:04.0 Off |                    0 |
     | N/A   42C    P0    27W / 250W |      0MiB / 16280MiB |      0%      Default |
     |                               |                      |                  N/A |

     | Processes:                                                                  |
     |  GPU   GI   CI        PID   Type   Process name                  GPU Memory |
     |        ID   ID                                                   Usage      |
     |  No running processes found                                                 |

     Architecture:        x86_64
     CPU op-mode(s):      32-bit, 64-bit
     Byte Order:          Little Endian
     CPU(s):              4
     On-line CPU(s) list: 0-3
     Thread(s) per core:  2
     Core(s) per socket:  2
     Socket(s):           1
     NUMA node(s):        1
     Vendor ID:           GenuineIntel
     CPU family:          6
     Model:               63
     Model name:          Intel(R) Xeon(R) CPU @ 2.30GHz
     Stepping:            0
     CPU MHz:             2299.998
     BogoMIPS:            4599.99
     Hypervisor vendor:   KVM
     Virtualization type: full
     L1d cache:           32K
     L1i cache:           32K
     L2 cache:            256K
     L3 cache:            46080K
     NUMA node0 CPU(s):   0-3
     Flags:               fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ss ht syscall nx pdpe1gb rdtscp lm constant_tsc rep_good nopl xtopology nonstop_tsc cpuid tsc_known_freq pni pclmulqdq ssse3 fma cx16 pcid sse4_1 sse4_2 x2apic movbe popcnt aes xsave avx f16c rdrand hypervisor lahf_lm abm invpcid_single ssbd ibrs ibpb stibp fsgsbase tsc_adjust bmi1 avx2 smep bmi2 erms invpcid xsaveopt arat md_clear arch_capabilities

     cmake version 3.12.0

     CMake suite maintained and supported by Kitware (kitware.com/cmake).

     g++ (Ubuntu 7.5.0-3ubuntu1~18.04) 7.5.0
     Copyright (C) 2017 Free Software Foundation, Inc.
     This is free software; see the source for copying conditions.  There is NO

     nvcc: NVIDIA (R) Cuda compiler driver
     Copyright (c) 2005-2020 NVIDIA Corporation
     Built on Mon_Oct_12_20:09:46_PDT_2020
     Cuda compilation tools, release 11.1, V11.1.105
     Build cuda_11.1.TC455_06.29190527_0

     Python 3.8.10

     ***Environment Variables***
     PATH                            : /root/miniconda3/bin:/usr/local/nvidia/bin:/usr/local/cuda/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/tools/node/bin:/tools/google-cloud-sdk/bin:/opt/bin
     LD_LIBRARY_PATH                 : /usr/lib64-nvidia
     NUMBAPRO_NVVM                   :
     NUMBAPRO_LIBDEVICE              :
     CONDA_PREFIX                    :
     PYTHON_PATH                     :

beckernick commented 2 years ago

This is likely coming from cupy's asarray, which is used in the DataFrame constructor when you pass a numpy array.

Would you be open to filing this issue instead at https://github.com/cupy/cupy/issues/ to consolidate discussion?

%load_ext memory_profiler
import numpy as np
import cupy
for n in (7, 8, 9):
    a = np.ones((10**n,), dtype=np.float32)
    %memit cupy.asarray(a)
peak memory: 514.58 MiB, increment: 350.82 MiB
peak memory: 1369.92 MiB, increment: 512.01 MiB
peak memory: 8899.16 MiB, increment: 4096.00 MiB
NightMachinery commented 2 years ago

@beckernick I filed an issue there, but this can only explain about half of the excess memory consumption.

beckernick commented 2 years ago

Thanks. The differences in the tests above for smaller data sizes (but same result for 9GB) is probably related to how CuPy uses pool allocators for performance.

Is CuPy's use of CPU memory affecting a cuDF or Dask workload negatively?

github-actions[bot] commented 2 years ago

This issue has been labeled inactive-30d due to no recent activity in the past 30 days. Please close this issue if no further response or action is needed. Otherwise, please respond with a comment indicating any updates or changes to the original issue and/or confirm this issue still needs to be addressed. This issue will be labeled inactive-90d if there is no activity in the next 60 days.

beckernick commented 2 years ago

Closing this issue, as this is generally expected behavior and implementing a cap for CPU pinned memory that affects other libraries by default is out of scope for cuDF. Please take further discussion to the CuPy issue linked above.