rapidsai / cudf

cuDF - GPU DataFrame Library
https://docs.rapids.ai/api/cudf/stable/
Apache License 2.0
8.41k stars 899 forks source link

[BUG] cuDF-python import-time error with pyarrow #13679

Closed GregoryKimball closed 1 year ago

GregoryKimball commented 1 year ago

Describe the bug @benfred reported an import-time error with cuDF-python

In [13]: import cudf
df <jemalloc>: Unsupported system page size
<jemalloc>: Unsupported system page size
---------------------------------------------------------------------------
ArrowMemoryError                          Traceback (most recent call last)
<ipython-input-13-e13365c50bc4> in <module>
----> 1 import cudf

/opt/conda/envs/rapids/lib/python3.10/site-packages/cudf/__init__.py in <module>
     74 from cudf.core.tools.datetimes import DateOffset, date_range, to_datetime
     75 from cudf.core.tools.numeric import to_numeric
---> 76 from cudf.io import (
     77     from_dlpack,
     78     read_avro,

/opt/conda/envs/rapids/lib/python3.10/site-packages/cudf/io/__init__.py in <module>
      7 from cudf.io.json import read_json
      8 from cudf.io.orc import read_orc, read_orc_metadata, to_orc
----> 9 from cudf.io.parquet import (
     10     ParquetDatasetWriter,
     11     merge_parquet_filemetadata,

/opt/conda/envs/rapids/lib/python3.10/site-packages/cudf/io/parquet.py in <module>
     15 import numpy as np
     16 import pandas as pd
---> 17 from pyarrow import dataset as ds, parquet as pq
     18 
     19 import cudf

/opt/conda/envs/rapids/lib/python3.10/site-packages/pyarrow/dataset.py in <module>
     21 from pyarrow.util import _is_iterable, _stringify_path, _is_path_like
     22 
---> 23 from pyarrow._dataset import (  # noqa
     24     CsvFileFormat,
     25     CsvFragmentScanOptions,

/opt/conda/envs/rapids/lib/python3.10/site-packages/pyarrow/_dataset.pyx in init pyarrow._dataset()

/opt/conda/envs/rapids/lib/python3.10/site-packages/pyarrow/_compute.pyx in pyarrow._compute.Expression._scalar()

/opt/conda/envs/rapids/lib/python3.10/site-packages/pyarrow/scalar.pxi in pyarrow.lib.scalar()

/opt/conda/envs/rapids/lib/python3.10/site-packages/pyarrow/error.pxi in pyarrow.lib.pyarrow_internal_check_status()

/opt/conda/envs/rapids/lib/python3.10/site-packages/pyarrow/error.pxi in pyarrow.lib.check_status()

ArrowMemoryError: malloc of size 64 failed

Steps/Code to reproduce bug Build cudf from source, and then attempt to import cudf in python

Expected behavior import succeeds

Environment overview (please complete the following information)

vyasr commented 1 year ago

I believe this will be fixed by https://github.com/conda-forge/arrow-cpp-feedstock/pull/1116, correct?

benfred commented 1 year ago

This is fixed with the changes to the arrow-cpp-feedstock -

vyasr commented 1 year ago

Thanks!