[BUG] Cannot install through any means

Linardos commented 1 week ago

I followed the steps to install exactly as described but none of the options work sadly:

Package not in pip nor conda:

(gsynth) locolinux2@IN-OTA-232347:~$ pip install gandlf-synth
ERROR: Could not find a version that satisfies the requirement gandlf-synth (from versions: none)
ERROR: No matching distribution found for gandlf-synth
(gsynth) locolinux2@IN-OTA-232347:~$ conda install -c conda-forge gandlf-synth -y
Channels:
 - conda-forge
 - defaults
Platform: linux-64
Collecting package metadata (repodata.json): done
Solving environment: failed

PackagesNotFoundError: The following packages are not available from current channels:

  - gandlf-synth

Current channels:

  - https://conda.anaconda.org/conda-forge
  - defaults

To search for alternate channels that may provide the conda package you're
looking for, navigate to

    https://anaconda.org

and use the search bar at the top of the page.

But it neithers works through cloning and installing directly:

(gsynth) locolinux2@IN-OTA-232347:~/GaNDLF-Synth$ pip install .
Processing /home/locolinux2/GaNDLF-Synth
  Installing build dependencies ... done
  Getting requirements to build wheel ... done
  Preparing metadata (pyproject.toml) ... done
Collecting GANDLF@ git+https://github.com/mlcommons/GandLF.git@master (from gandlf_synth==0.0.1.dev0)
  Cloning https://github.com/mlcommons/GandLF.git (to revision master) to /tmp/pip-install-nkslaka2/gandlf_717cd84cc6b14cd39c48d0d1e6c9a5ec
  Running command git clone --filter=blob:none --quiet https://github.com/mlcommons/GandLF.git /tmp/pip-install-nkslaka2/gandlf_717cd84cc6b14cd39c48d0d1e6c9a5ec
  Resolved https://github.com/mlcommons/GandLF.git to commit a1fb3f49f1ef0b148d9c4b0826e840d38d0bae38
  Installing build dependencies ... done
  Getting requirements to build wheel ... done
  Preparing metadata (pyproject.toml) ... done
Collecting black==23.11.0 (from gandlf_synth==0.0.1.dev0)
  Using cached black-23.11.0-cp39-cp39-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.metadata (66 kB)
Collecting lightning==2.4.0 (from gandlf_synth==0.0.1.dev0)
  Using cached lightning-2.4.0-py3-none-any.whl.metadata (38 kB)
Collecting monai-generative==0.2.3 (from gandlf_synth==0.0.1.dev0)
  Using cached monai_generative-0.2.3-py3-none-any.whl.metadata (4.6 kB)
Collecting deepspeed==0.15.1 (from gandlf_synth==0.0.1.dev0)
  Using cached deepspeed-0.15.1.tar.gz (1.4 MB)
  Preparing metadata (setup.py) ... error
  error: subprocess-exited-with-error

  × python setup.py egg_info did not run successfully.
  │ exit code: 1
  ╰─> [8 lines of output]
      Traceback (most recent call last):
        File "<string>", line 2, in <module>
        File "<pip-setuptools-caller>", line 34, in <module>
        File "/tmp/pip-install-nkslaka2/deepspeed_584e40eaecaf4247a29a048c4c72290b/setup.py", line 108, in <module>
          cuda_major_ver, cuda_minor_ver = installed_cuda_version()
        File "/tmp/pip-install-nkslaka2/deepspeed_584e40eaecaf4247a29a048c4c72290b/op_builder/builder.py", line 51, in installed_cuda_version
          raise MissingCUDAException("CUDA_HOME does not exist, unable to compile CUDA op(s)")
      op_builder.builder.MissingCUDAException: CUDA_HOME does not exist, unable to compile CUDA op(s)
      [end of output]

  note: This error originates from a subprocess, and is likely not a problem with pip.
error: metadata-generation-failed

× Encountered error while generating package metadata.
╰─> See above for output.

note: This is an issue with the package mentioned above, not pip.
hint: See above for details.

sarthakpati commented 1 week ago

Were you able to install PyTorch?

https://docs.mlcommons.org/GaNDLF-Synth/setup/#installation

Linardos commented 1 week ago

Yes

sarthakpati commented 1 week ago

@szmazurek: can you think of anything for this? I am unable to replicate it on 3 machines (Windows, Ubuntu, Mint). I have put together a small script to get some debugging information from the environment here. Can you think of anything else to add?

szmazurek commented 1 week ago

Yeah, so with pypi I can imagine that, afaik we did not have the package built and uploaded here. Regarding the installation from the source it seems that you are missing Nvidia compiler (nvcc), which is apparently needed by deepspeed dependency. Can you check if nvcc is installed @Linardos? If not, perhaps installation would do the trick. Next thing can be PATH setting, ensure that all Nvidia related binaries are accessible.

sarthakpati commented 1 week ago

If NVCC is needed, perhaps it might make sense to include it in the documentation. I believe installing one of the following (based on the user's system) should be fine:

CUDA11: https://pypi.org/project/nvidia-cuda-nvcc-cu11/
CUDA12: https://pypi.org/project/nvidia-cuda-nvcc-cu12/

Thanks for helping us catch this, @Linardos! I am guessing that since all of my (and Szymon's) machines are set up for development, nvcc is automatically found and we don't encounter this.

Relevant issue from DeepSpeed: https://github.com/microsoft/DeepSpeed/issues/2772

EDIT: I also found a cuda-python package on pip but I think that's only for CUDA12.

szmazurek commented 6 days ago

Yeah, this indeed would be needed - @Linardos if you can confirm that the issue by @sarthakpati #25 will address that.

Linardos commented 6 days ago

I just installed it through pip, but that doesn't seem to solve it. I have CUDA 12.4 in my machine

(gsynth) locolinux2@IN-OTA-232347:~/GaNDLF-Synth$ pip install nvidia-cuda-nvcc-cu12
Requirement already satisfied: nvidia-cuda-nvcc-cu12 in /home/locolinux2/miniconda3/envs/gsynth/lib/python3.9/site-packages (12.6.77)
(gsynth) locolinux2@IN-OTA-232347:~/GaNDLF-Synth$ pip install .
Processing /home/locolinux2/GaNDLF-Synth
  Installing build dependencies ... done
  Getting requirements to build wheel ... done
  Preparing metadata (pyproject.toml) ... done
Collecting GANDLF@ git+https://github.com/mlcommons/GandLF.git@master (from gandlf_synth==0.0.1.dev0)
  Cloning https://github.com/mlcommons/GandLF.git (to revision master) to /tmp/pip-install-kuttpdr9/gandlf_8583bbd08e34436b9794e4167f37ac38
  Running command git clone --filter=blob:none --quiet https://github.com/mlcommons/GandLF.git /tmp/pip-install-kuttpdr9/gandlf_8583bbd08e34436b9794e4167f37ac38
  Resolved https://github.com/mlcommons/GandLF.git to commit 709f6ab59e57782f0b1937b24a1d8a85cd222c42
  Installing build dependencies ... done
  Getting requirements to build wheel ... done
  Preparing metadata (pyproject.toml) ... done
Collecting black==23.11.0 (from gandlf_synth==0.0.1.dev0)
  Using cached black-23.11.0-cp39-cp39-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.metadata (66 kB)
Collecting lightning==2.4.0 (from gandlf_synth==0.0.1.dev0)
  Using cached lightning-2.4.0-py3-none-any.whl.metadata (38 kB)
Collecting monai-generative==0.2.3 (from gandlf_synth==0.0.1.dev0)
  Using cached monai_generative-0.2.3-py3-none-any.whl.metadata (4.6 kB)
Collecting deepspeed==0.15.1 (from gandlf_synth==0.0.1.dev0)
  Using cached deepspeed-0.15.1.tar.gz (1.4 MB)
  Preparing metadata (setup.py) ... error
  error: subprocess-exited-with-error

  × python setup.py egg_info did not run successfully.
  │ exit code: 1
  ╰─> [8 lines of output]
      Traceback (most recent call last):
        File "<string>", line 2, in <module>
        File "<pip-setuptools-caller>", line 34, in <module>
        File "/tmp/pip-install-kuttpdr9/deepspeed_fd7409a5b2dd41cda27dd8d978d665d2/setup.py", line 108, in <module>
          cuda_major_ver, cuda_minor_ver = installed_cuda_version()
        File "/tmp/pip-install-kuttpdr9/deepspeed_fd7409a5b2dd41cda27dd8d978d665d2/op_builder/builder.py", line 51, in installed_cuda_version
          raise MissingCUDAException("CUDA_HOME does not exist, unable to compile CUDA op(s)")
      op_builder.builder.MissingCUDAException: CUDA_HOME does not exist, unable to compile CUDA op(s)
      [end of output]

  note: This error originates from a subprocess, and is likely not a problem with pip.
error: metadata-generation-failed

× Encountered error while generating package metadata.
╰─> See above for output.

note: This is an issue with the package mentioned above, not pip.
hint: See above for details.

However, I installed nvcc through sudo apt install nvidia-cuda-toolkit instead and that worked.

(gsynth) locolinux2@IN-OTA-232347:~/GaNDLF-Synth$ nvcc --version
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2021 NVIDIA Corporation
Built on Thu_Nov_18_09:45:30_PST_2021
Cuda compilation tools, release 11.5, V11.5.119
Build cuda_11.5.r11.5/compiler.30672275_0
(gsynth) locolinux2@IN-OTA-232347:~/GaNDLF-Synth$ pip install .
Processing /home/locolinux2/GaNDLF-Synth
  Installing build dependencies ... done
  Getting requirements to build wheel ... done
  Preparing metadata (pyproject.toml) ... done
Collecting GANDLF@ git+https://github.com/mlcommons/GandLF.git@master (from gandlf_synth==0.0.1.dev0)
  Cloning https://github.com/mlcommons/GandLF.git (to revision master) to /tmp/pip-install-1letf4zy/gandlf_9ec66c16bdc542989ba33faa0c893907
  Running command git clone --filter=blob:none --quiet https://github.com/mlcommons/GandLF.git /tmp/pip-install-1letf4zy/gandlf_9ec66c16bdc542989ba33faa0c893907
  Resolved https://github.com/mlcommons/GandLF.git to commit 709f6ab59e57782f0b1937b24a1d8a85cd222c42
  Installing build dependencies ... done
  Getting requirements to build wheel ... done
  Preparing metadata (pyproject.toml) ... done
Collecting black==23.11.0 (from gandlf_synth==0.0.1.dev0)
  Using cached black-23.11.0-cp39-cp39-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.metadata (66 kB)
Collecting lightning==2.4.0 (from gandlf_synth==0.0.1.dev0)
  Using cached lightning-2.4.0-py3-none-any.whl.metadata (38 kB)
Collecting monai-generative==0.2.3 (from gandlf_synth==0.0.1.dev0)
  Using cached monai_generative-0.2.3-py3-none-any.whl.metadata (4.6 kB)
Collecting deepspeed==0.15.1 (from gandlf_synth==0.0.1.dev0)
  Using cached deepspeed-0.15.1.tar.gz (1.4 MB)
  Preparing metadata (setup.py) ... done
Collecting click>=8.0.0 (from black==23.11.0->gandlf_synth==0.0.1.dev0)
  Using cached click-8.1.7-py3-none-any.whl.metadata (3.0 kB)
...

it seems to have been installed successfully.

sarthakpati commented 6 days ago

I think the "solution" of doing sudo install anything is inherently problematic (security issues, and not all folks might have root level access). Is there any way we can check if this would work using conda instead?

Linardos commented 6 days ago

This one should work then maybe add that step in the README (I didn't test it but it seems to be the standard steps to do it with conda):

conda install -c nvidia cudatoolkit Verify your installation with nvcc --version

sarthakpati commented 6 days ago

Cool. In this case, we need to have an explicit dependency on conda.

szmazurek commented 5 days ago

I do not think that requiring nvcc as the underlying requirement is problematic from the user's perspective, it is basically something you need alongside CUDA drivers for this package. Falling back to conda is one solution, but I would not push it as the only go-to, rather a workaround (also it can be included in the container).

sarthakpati commented 5 days ago

Since it is on the user-level, I think conda should be the primary solution. Anything that is system-level (i.e., sudo install or equivalent) should be the fallback.

mlcommons / GaNDLF-Synth

[BUG] Cannot install through any means #24