rapidsai / cudf

cuDF - GPU DataFrame Library
https://docs.rapids.ai/api/cudf/stable/
Apache License 2.0
8.23k stars 883 forks source link

[BUG] error: subprocess-exited-with-error,error: metadata-generation-failed #16278

Open HGanFan opened 1 month ago

HGanFan commented 1 month ago

Hi!!! When I try to use "pip install --extra-index-url https://pypi.nvidia.com cudf-cu11" to install cudf, I meet the following error:

Collecting cudf-cu11 Using cached cudf_cu11-24.6.1.tar.gz (2.6 kB) Installing build dependencies ... done Getting requirements to build wheel ... done Preparing metadata (pyproject.toml) ... error error: subprocess-exited-with-error

× Preparing metadata (pyproject.toml) did not run successfully. │ exit code: 1 ╰─> [104 lines of output] File "/tmp/pip-build-env-czfl19z8/overlay/lib/python3.10/site-packages/nvidia_stub/wheel.py", line 177, in download_wheel return download_manual(wheel_directory, distribution, version) File "/tmp/pip-build-env-czfl19z8/overlay/lib/python3.10/site-packages/nvidia_stub/wheel.py", line 137, in download_manual index_response = urlopen_with_retry(f"{NVIDIA_PIP_INDEX_URL}/{distribution}/") File "/tmp/pip-build-env-czfl19z8/overlay/lib/python3.10/site-packages/nvidia_stub/wheel.py", line 70, in urlopen_with_retry return urlopen(url, *kwargs) File "/data/anaconda3/envs/largemodel/lib/python3.10/urllib/request.py", line 216, in urlopen return opener.open(url, data, timeout) File "/data/anaconda3/envs/largemodel/lib/python3.10/urllib/request.py", line 519, in open response = self._open(req, data) File "/data/anaconda3/envs/largemodel/lib/python3.10/urllib/request.py", line 536, in _open result = self._call_chain(self.handle_open, protocol, protocol + File "/data/anaconda3/envs/largemodel/lib/python3.10/urllib/request.py", line 496, in _call_chain result = func(args) File "/data/anaconda3/envs/largemodel/lib/python3.10/urllib/request.py", line 1391, in https_open return self.do_open(http.client.HTTPSConnection, req, File "/data/anaconda3/envs/largemodel/lib/python3.10/urllib/request.py", line 1351, in do_open raise URLError(err) Traceback (most recent call last): File "/data/anaconda3/envs/largemodel/lib/python3.10/urllib/request.py", line 1348, in do_open h.request(req.get_method(), req.selector, req.data, headers, File "/data/anaconda3/envs/largemodel/lib/python3.10/http/client.py", line 1276, in request self._send_request(method, url, body, headers, encode_chunked) File "/data/anaconda3/envs/largemodel/lib/python3.10/http/client.py", line 1322, in _send_request self.endheaders(body, encode_chunked=encode_chunked) File "/data/anaconda3/envs/largemodel/lib/python3.10/http/client.py", line 1271, in endheaders self._send_output(message_body, encode_chunked=encode_chunked) File "/data/anaconda3/envs/largemodel/lib/python3.10/http/client.py", line 1031, in _send_output self.send(msg) File "/data/anaconda3/envs/largemodel/lib/python3.10/http/client.py", line 969, in send self.connect() File "/data/anaconda3/envs/largemodel/lib/python3.10/http/client.py", line 1441, in connect super().connect() File "/data/anaconda3/envs/largemodel/lib/python3.10/http/client.py", line 940, in connect self.sock = self._create_connection( File "/data/anaconda3/envs/largemodel/lib/python3.10/socket.py", line 845, in create_connection raise err File "/data/anaconda3/envs/largemodel/lib/python3.10/socket.py", line 833, in create_connection sock.connect(sa) TimeoutError: [Errno 110] Connection timed out

  During handling of the above exception, another exception occurred:

  Traceback (most recent call last):
    File "/tmp/pip-build-env-czfl19z8/overlay/lib/python3.10/site-packages/nvidia_stub/wheel.py", line 177, in download_wheel
      return download_manual(wheel_directory, distribution, version)
    File "/tmp/pip-build-env-czfl19z8/overlay/lib/python3.10/site-packages/nvidia_stub/wheel.py", line 137, in download_manual
      index_response = urlopen_with_retry(f"{NVIDIA_PIP_INDEX_URL}/{distribution}/")
    File "/tmp/pip-build-env-czfl19z8/overlay/lib/python3.10/site-packages/nvidia_stub/wheel.py", line 70, in urlopen_with_retry
      return urlopen(url, **kwargs)
    File "/data/anaconda3/envs/largemodel/lib/python3.10/urllib/request.py", line 216, in urlopen
      return opener.open(url, data, timeout)
    File "/data/anaconda3/envs/largemodel/lib/python3.10/urllib/request.py", line 519, in open
      response = self._open(req, data)
    File "/data/anaconda3/envs/largemodel/lib/python3.10/urllib/request.py", line 536, in _open
      result = self._call_chain(self.handle_open, protocol, protocol +
    File "/data/anaconda3/envs/largemodel/lib/python3.10/urllib/request.py", line 496, in _call_chain
      result = func(*args)
    File "/data/anaconda3/envs/largemodel/lib/python3.10/urllib/request.py", line 1391, in https_open
      return self.do_open(http.client.HTTPSConnection, req,
    File "/data/anaconda3/envs/largemodel/lib/python3.10/urllib/request.py", line 1351, in do_open
      raise URLError(err)
  urllib.error.URLError: <urlopen error [Errno 110] Connection timed out>

  During handling of the above exception, another exception occurred:

  Traceback (most recent call last):
    File "/data/anaconda3/envs/largemodel/lib/python3.10/site-packages/pip/_vendor/pyproject_hooks/_in_process/_in_process.py", line 353, in <module>
      main()
    File "/data/anaconda3/envs/largemodel/lib/python3.10/site-packages/pip/_vendor/pyproject_hooks/_in_process/_in_process.py", line 335, in main
      json_out['return_val'] = hook(**hook_input['kwargs'])
    File "/data/anaconda3/envs/largemodel/lib/python3.10/site-packages/pip/_vendor/pyproject_hooks/_in_process/_in_process.py", line 152, in prepare_metadata_for_build_wheel
      whl_basename = backend.build_wheel(metadata_directory, config_settings)
    File "/tmp/pip-build-env-czfl19z8/overlay/lib/python3.10/site-packages/nvidia_stub/buildapi.py", line 29, in build_wheel
      return download_wheel(pathlib.Path(wheel_directory), config_settings)
    File "/tmp/pip-build-env-czfl19z8/overlay/lib/python3.10/site-packages/nvidia_stub/wheel.py", line 179, in download_wheel
      report_install_failure(distribution, version, exception_context)
    File "/tmp/pip-build-env-czfl19z8/overlay/lib/python3.10/site-packages/nvidia_stub/error.py", line 63, in report_install_failure
      raise InstallFailedError(
  nvidia_stub.error.InstallFailedError:
  *******************************************************************************

  The installation of cudf-cu11 for version 24.6.1 failed.

  This is a special placeholder package which downloads a real wheel package
  from https://pypi.nvidia.com. If https://pypi.nvidia.com is not reachable, we
  cannot download the real wheel file to install.

  You might try installing this package via
  ```
  $ pip install --extra-index-url https://pypi.nvidia.com cudf-cu11
  ```

  Here is some debug information about your platform to include in any bug
  report:

  Python Version: CPython 3.10.0
  Operating System: Linux 3.10.0-1160.el7.x86_64
  CPU Architecture: x86_64
  Driver Version: 470.82
  CUDA Version: 11.4

  *******************************************************************************

  [end of output]

note: This error originates from a subprocess, and is likely not a problem with pip. error: metadata-generation-failed

× Encountered error while generating package metadata. ╰─> See above for output.

note: This is an issue with the package mentioned above, not pip. hint: See above for details.

And when I try to install cudf with whl file, for example, "pip install cudf_cu11-24.6.1-cp310-cp310-manylinux_2_28_x86_64.whl", I meet

ERROR: cudf_cu11-24.6.1-cp310-cp310-manylinux_2_28_x86_64.whl is not a supported wheel on this platform.

So can you help to address this trouble and install cudf successfully? Look forward to your reply!!!!

bdice commented 1 month ago

Hi @HGanFan, it looks like you are using a system with glibc 2.17 (Enterprise Linux 7 / CentOS 7). RAPIDS has dropped support for this in the 24.06 release because CentOS 7 reached end of life as of June 2024. You can use RAPIDS 24.04 or update your Linux distribution (Ubuntu 20.04+ or RHEL / Rocky 8+ are supported, as are many other recent distributions). Though pip wheels will not be installable, you may also be able to install RAPIDS conda packages (until conda-forge migrates to a minimum of glibc 2.28) or Docker containers on your current OS. Please try some of those options and let us know if you need more help.

HGanFan commented 1 month ago

Hi!!!@bdice

My system information is "Linux version 3.10.0-1160.el7.x86_64 (mockbuild@kbuilder.bsys.centos.org) (gcc version 4.8.5 20150623 (Red Hat 4.8.5-44) (GCC) )".

And I follow your suggestion and install cudf=23.8, but when I import cudf, I meet the following error:

import cudf Traceback (most recent call last): File "", line 1, in File "/data/anaconda3/envs/largemodel2/lib/python3.9/site-packages/cudf/init.py", line 10, in validate_setup() File "/data/anaconda3/envs/largemodel2/lib/python3.9/site-packages/cudf/utils/gpu_utils.py", line 95, in validate_setup cuda_runtime_version = runtimeGetVersion() File "/data/anaconda3/envs/largemodel2/lib/python3.9/site-packages/rmm/_cuda/gpu.py", line 88, in runtimeGetVersion major, minor = numba.cuda.runtime.get_version() File "/data/anaconda3/envs/largemodel2/lib/python3.9/site-packages/numba/cuda/cudadrv/runtime.py", line 111, in get_version self.cudaRuntimeGetVersion(ctypes.byref(rtver)) File "/data/anaconda3/envs/largemodel2/lib/python3.9/site-packages/numba/cuda/cudadrv/runtime.py", line 65, in getattr self._initialize() File "/data/anaconda3/envs/largemodel2/lib/python3.9/site-packages/numba/cuda/cudadrv/runtime.py", line 51, in _initialize self.lib = open_cudalib('cudart') File "/data/anaconda3/envs/largemodel2/lib/python3.9/site-packages/numba/cuda/cudadrv/libs.py", line 65, in open_cudalib return ctypes.CDLL(path) File "/data/anaconda3/envs/largemodel2/lib/python3.9/ctypes/init.py", line 382, in init self._handle = _dlopen(self._name, mode) OSError: libcudart.so: cannot open shared object file: No such file or directory

My cuda is 11.8 and I install it under the guidance "https://developer.nvidia.com/cuda-11-8-0-download-archive?target_os=Linux&target_arch=x86_64&Distribution=CentOS&target_version=7&target_type=runfile_local".

Can you help me to fix this error? Look forward to your reply!

HGanFan commented 1 month ago

Hi!!!@bdice Can you help me to fix this error?

bdice commented 1 month ago

@HGanFan, sorry for the delay. Can you share the output of nvidia-smi? Also, do you see the libcudart.so library on your system? It may have installed to a location like /usr/local/cuda/.... I apologize, I won't be able to debug your system very effectively, but perhaps the above steps will help. Also, the RAPIDS 23.08 release is fairly old. If you're stuck with CentOS 7, you may want to try 24.04 instead. It is the final release with CentOS 7 support.