Closed NightMachinery closed 2 years ago
This is likely coming from cupy's asarray
, which is used in the DataFrame constructor when you pass a numpy array.
Would you be open to filing this issue instead at https://github.com/cupy/cupy/issues/ to consolidate discussion?
%load_ext memory_profiler
import numpy as np
import cupy
for n in (7, 8, 9):
a = np.ones((10**n,), dtype=np.float32)
%memit cupy.asarray(a)
peak memory: 514.58 MiB, increment: 350.82 MiB
peak memory: 1369.92 MiB, increment: 512.01 MiB
peak memory: 8899.16 MiB, increment: 4096.00 MiB
@beckernick I filed an issue there, but this can only explain about half of the excess memory consumption.
Thanks. The differences in the tests above for smaller data sizes (but same result for 9GB) is probably related to how CuPy uses pool allocators for performance.
Is CuPy's use of CPU memory affecting a cuDF or Dask workload negatively?
This issue has been labeled inactive-30d
due to no recent activity in the past 30 days. Please close this issue if no further response or action is needed. Otherwise, please respond with a comment indicating any updates or changes to the original issue and/or confirm this issue still needs to be addressed. This issue will be labeled inactive-90d
if there is no activity in the next 60 days.
Closing this issue, as this is generally expected behavior and implementing a cap for CPU pinned memory that affects other libraries by default is out of scope for cuDF. Please take further discussion to the CuPy issue linked above.
Describe the bug Creating a DataFrame from a numpy array consumes too much RAM.
Steps/Code to reproduce bug
This will result in the following main memory (not GPU memory!) usages:
Expected behavior The numpy array itself occupies(you can comment the last two lines to see this):
The expected behavior is that the conversion should happen with some constant O(1) overhead, not an O(n) overhead.
Environment overview (please complete the following information)
Environment details Please run and paste the output of the
cudf/print_env.sh
script here, to gather any other relevant environment details