huggingface / nanotron

Minimalistic large language model 3D-parallelism training
Apache License 2.0
1.14k stars 107 forks source link

`FileNotFoundError: [Errno 2] No such file or directory: '/usr/local/cuda/bin/nvcc'` #167

Closed NouamaneTazi closed 4 months ago

NouamaneTazi commented 4 months ago

When trying to install flash-attn: pip install "flash-attn>=2.5.0" --no-build-isolation

I run into the following issue:

      Traceback (most recent call last):
        File "<string>", line 2, in <module>
        File "<pip-setuptools-caller>", line 34, in <module>
        File "/tmp/pip-install-2kh9k980/flash-attn_fa45a608924c454f9022317b43e6991b/setup.py", line 113, in <module>
          _, bare_metal_version = get_cuda_bare_metal_version(CUDA_HOME)
        File "/tmp/pip-install-2kh9k980/flash-attn_fa45a608924c454f9022317b43e6991b/setup.py", line 65, in get_cuda_bare_metal_version
          raw_output = subprocess.check_output([cuda_dir + "/bin/nvcc", "-V"], universal_newlines=True)
        File "/home/user/miniconda/lib/python3.9/subprocess.py", line 424, in check_output
          return run(*popenargs, stdout=PIPE, timeout=timeout, check=True,
        File "/home/user/miniconda/lib/python3.9/subprocess.py", line 505, in run
          with Popen(*popenargs, **kwargs) as process:
        File "/home/user/miniconda/lib/python3.9/subprocess.py", line 951, in __init__
          self._execute_child(args, executable, preexec_fn, close_fds,
        File "/home/user/miniconda/lib/python3.9/subprocess.py", line 1821, in _execute_child
          raise child_exception_type(errno_num, err_msg, err_filename)
      FileNotFoundError: [Errno 2] No such file or directory: '/usr/local/cuda/bin/nvcc'
      [end of output]
NouamaneTazi commented 4 months ago

That means that not all cuda tools are installed. So we need to install the cuda version compatible with our nvidia-smi. You can find the command to run here: https://anaconda.org/nvidia/cuda-toolkit

For example for CUDA 12.2:

conda install -y nvidia/label/cuda-12.2.0::cuda-toolkit

Then you can check that cuda tools were correctly installed by running: nvcc -V If nvcc is still undefined you should run:

export CUDA_HOME=/usr/local/cuda
export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/usr/local/cuda/lib64:/usr/local/cuda/extras/CUPTI/lib64
export PATH=$PATH:$CUDA_HOME/bin

Then try again nvcc -V