gschramm / parallelproj

code for parallel TOF and NONTOF projections
MIT License
27 stars 8 forks source link

Issue Regarding Installation Conflicts Between parallelproj and PyTorch #76

Closed SS99aaNN closed 2 months ago

SS99aaNN commented 2 months ago

Dear developer,

I tried to install both parallelproj and pytorch-CUDA on Linux, but I encountered some issues.

If I follow the instructions in the documentation using conda install -c conda-forge parallelproj pytorch, torch will be the CPU version.

However, when I try to install them separately (regardless of whether I install parallelproj first or pytorch first), the installation process of the second package always prompts dependency conflicts and fails to install after checking.

The anaconda environment I tried to install is as follows: python==3.9, cudatoolkit==11.8, pytorch==2.2.1 / 2.3.1.

I would like to know if parallelproj and pytorch-CUDA need to be in specific versions to coexist, and if so, what the compatible version combination is?

gschramm commented 2 months ago

Thanks for reporting this. On which platform (linux, windows, ...) do you observe this?

SS99aaNN commented 2 months ago

Thanks for reporting this. On which platform (linux, windows, ...) do you observe this?

i 've try it on both mac and linux ubuntu 20.04.3(both on conda 23.3.1), they all showed the same output when i install these two package in conda. something like this. image

SS99aaNN commented 2 months ago

Thanks for reporting this. On which platform (linux, windows, ...) do you observe this?

i 've try it on both mac and linux ubuntu 20.04.3(both on conda 23.3.1), they all showed the same output when i install these two package in conda. something like this. image

by the way, nothing would be installed if i wait till it end.

gschramm commented 2 months ago

ok. I will have a look (the conda package config was recently changed).

To debug this, you execute the following on send the output:

conda create -c conda-forge -n test parallelproj
conda activate test
conda list | grep parallelproj

I would like to see which libparallelproj version (cpu or cuda) gets installed if you only install parallelproj

gschramm commented 2 months ago

Ok. I can reproduce this bug :-(

However, here is a quick work-around to fix this (first install parallelproj and then pytorch in a fresh env.) which works for me on ubuntu:

conda create -c conda-forge -n new_env parallelproj
conda activate new_env
conda install -c conda-forge pytorch
conda list | grep parallelproj
conda list | grep pytorch
SS99aaNN commented 2 months ago

Ok. I can reproduce this bug :-(

However, here is a quick work-around to fix this (first install parallelproj and then pytorch in a fresh env.) which works for me on ubuntu:

conda create -c conda-forge -n new_env parallelproj
conda activate new_env
conda install -c conda-forge pytorch
conda list | grep parallelproj
conda list | grep pytorch

output of your steps:

libparallelproj           1.9.1            cpu_h740bc59_1    conda-forge
parallelproj              1.9.1            pyha770c72_201    conda-forge
pytorch                   2.3.1           cpu_generic_py312h2f1fc2b_0    conda-forge

and it seems that it can't use gpu, when i import it showed this:

image

the system cuda version is 12.5, I'm not sure if the issue is coming from this. i will try install cudatoolkit=11.8 from conda

gschramm commented 2 months ago

the system cuda should not matter, since a matching cudatoolkit version should be installed from mamba.

Which parallelproj version gets installed if you only install parallelproj and nothing else in a new environment?

SS99aaNN commented 2 months ago

the system cuda should not matter, since a matching cudatoolkit version should be installed from mamba.

Which parallelproj version gets installed if you only install parallelproj and nothing else in a new environment?

this one

libparallelproj           1.9.1            cpu_h740bc59_1    conda-forge
parallelproj              1.9.1            pyha770c72_201    conda-forge
hannah-saber commented 2 months ago

好的。我可以重现这个错误 :-(

但是,这里有一个快速的解决方法来解决这个问题(首先在一个新的环境中安装 parallelproj 然后安装 pytorch), 这个方法在我 ubuntu 上对我有用:

conda create -c conda-forge -n new_env parallelproj
conda activate new_env
conda install -c conda-forge pytorch
conda list | grep parallelproj
conda list | grep pytorch

Ok. I can reproduce this bug :-(

However, here is a quick work-around to fix this (first install parallelproj and then pytorch in a fresh env.) which works for me on ubuntu:

conda create -c conda-forge -n new_env parallelproj
conda activate new_env
conda install -c conda-forge pytorch
conda list | grep parallelproj
conda list | grep pytorch

Ok. I can reproduce this bug :-(

However, here is a quick work-around to fix this (first install parallelproj and then pytorch in a fresh env.) which works for me on ubuntu:

conda create -c conda-forge -n new_env parallelproj
conda activate new_env
conda install -c conda-forge pytorch
conda list | grep parallelproj
conda list | grep pytorch

Hey, I installed as below but still importError. Should I set environment variable? It confused me. image image

gschramm commented 2 months ago

好的。我可以重现这个错误 :-( 但是,这里有一个快速的解决方法来解决这个问题(首先在一个新的环境中安装 parallelproj 然后安装 pytorch), 这个方法在我 ubuntu 上对我有用:

conda create -c conda-forge -n new_env parallelproj
conda activate new_env
conda install -c conda-forge pytorch
conda list | grep parallelproj
conda list | grep pytorch

Ok. I can reproduce this bug :-( However, here is a quick work-around to fix this (first install parallelproj and then pytorch in a fresh env.) which works for me on ubuntu:

conda create -c conda-forge -n new_env parallelproj
conda activate new_env
conda install -c conda-forge pytorch
conda list | grep parallelproj
conda list | grep pytorch

Ok. I can reproduce this bug :-( However, here is a quick work-around to fix this (first install parallelproj and then pytorch in a fresh env.) which works for me on ubuntu:

conda create -c conda-forge -n new_env parallelproj
conda activate new_env
conda install -c conda-forge pytorch
conda list | grep parallelproj
conda list | grep pytorch

Hey, I installed as below but still importError. Should I set environment variable? It confused me. image image

You can set the env variable, "PARALLELPROJ_CUDA_LIB" manually. But it seems that you have a 2nd libparallelproj_cuda in /usr/local/lib which should not be the case if you installed from mamba / conda. Did you build it yourself? If so, I would remove it (or set the env. variable)

gschramm commented 2 months ago

@SS99aaNN new workaround (for the cpu version bug):

conda create -n test4 -c conda-forge libparallelproj parallelproj pytorch
conda activate test4
conda list | grep libparallelproj
conda list | grep pytorch

Can you try this and send the output?

hannah-saber commented 2 months ago

Yes, cuz I cannot import parallelproj by conda install. So I cmake it from github. Then I can import parallelproj and show blow(but parallelproj cupy False). I wanner try some examples in parallelproj projection, It shows OSError: no file with expected extension.

So I have to remove it or set the env. then? I will try. Thanks!

gschramm commented 2 months ago

Yes, cuz I cannot import parallelproj by conda install. So I cmake it from github. Then I can import parallelproj and show blow(but parallelproj cupy False). I wanner try some examples in parallelproj projection, It shows OSError: no file with expected extension.

So I have to remove it or set the env. then? I will try. Thanks!

  1. I recommend not to compile yourself unless you are a developer and know what you are doing. If you compile yourself, set the env. variables "PARALLELPROJ_C_LIB" and "PARALLELPROJ_CUDA_LIB" pointing to the installed libs.
  2. Installing from conda forge should work. Please run the commands above.

If you want to go with conda, remove the self-compiled libs

hannah-saber commented 2 months ago

Yes, I want to go with conda. But after I remove the compiled libs, it shows OSError:no file with expected extension. image

SS99aaNN commented 2 months ago

here i come the output in terminal:

libparallelproj           1.9.1            cpu_h740bc59_1    conda-forge
parallelproj              1.9.1            pyha770c72_201    conda-forge

the output of import test:

image

pytorch seems better, but parallelproj still import with error

gschramm commented 2 months ago

Hm. No idea what is going wrong. Seems to be a bug in the conda/mamba lib solver (I am talking to the conda-forge guys currently).

Can you run this final test for me:

conda create -n test5 -c conda-forge libparallelproj
conda activate test5
conda list | grep libparallelproj
SS99aaNN commented 2 months ago

output shows here: libparallelproj 1.9.1 cpu_h740bc59_1 conda-forge

Thank you very much for your response. Please notify me as soon as there are any updates. Best wishes

gschramm commented 2 months ago

Hm, this is already strange, since the solver should try to install the gpu package of libparallelproj. Maybe this is indeed related to cuda 12.5.

My last try for a workaround:

conda create -n test4 -c conda-forge libparallelproj parallelproj pytorch cudatoolkit
conda activate test4
conda list | grep libparallelproj
conda list | grep pytorch

This should install the cuda versions of libparallelproj, pytorch and the cudatoolkit 11.8.

SS99aaNN commented 2 months ago

it seems worse than last test. the output in terminal is:

libparallelproj           1.9.1            cpu_h740bc59_1    conda-forge
pytorch                   2.3.1           cpu_mkl_py312h3b258cc_100    conda-forge

the output at the import:

image

from the conda list, it shows it did install the cudatoolkit=11.8 correctly. image

SS99aaNN commented 2 months ago

conda create -n test4 -c conda-forge libparallelproj parallelproj pytorch cudatoolkit

oh i didn't realize your code is test before, i thought it's a new test 😶‍🌫️ i wonder what's your environment in your test, and what should it show if it is installed in a proper way?

gschramm commented 2 months ago

sorry, it was supposed to be a completely new env. (e.g. "test7"). On my linux system, where I also run CUDA 12.5, all those work arounds work :/, so this gets very hard to debug for me.

Last question: Do you have cuda install dir in your $PATH system env. variable?

SS99aaNN commented 2 months ago

As far as I know, CUDA is not installed in my user (non-root) directory, and there are no statements in my .bashrc pointing to CUDA. I hope my response is clear enough. btw in my attempts yesterday, I was able to correctly install parallelproj and it displayed CUDA properly during import. I hope this information can help in resolving the issue.

hannah-saber commented 2 months ago

嗯,这已经很奇怪了,因为求解器应该尝试安装 gpu 包libparallelproj。 也许这确实与 cuda 12.5 有关。

我最后一次尝试解决方法:

conda create -n test4 -c conda-forge libparallelproj parallelproj pytorch cudatoolkit
conda activate test4
conda list | grep libparallelproj
conda list | grep pytorch

这应该安装 libparallelproj、pytorch 和 cudatoolkit 11.8 的 cuda 版本。

conda create -n test4 -c conda-forge libparallelproj parallelproj pytorch cudatoolkit=11.8 It works! The problem is CUDA version, I guess....

gschramm commented 2 months ago

嗯,这已经很奇怪了,因为求解器应该尝试安装 gpu 包libparallelproj。 也许这确实与 cuda 12.5 有关。 我最后一次尝试解决方法:

conda create -n test4 -c conda-forge libparallelproj parallelproj pytorch cudatoolkit
conda activate test4
conda list | grep libparallelproj
conda list | grep pytorch

这应该安装 libparallelproj、pytorch 和 cudatoolkit 11.8 的 cuda 版本。

conda create -n test4 -c conda-forge libparallelproj parallelproj pytorch cudatoolkit=11.8 It works! The problem is CUDA version, I guess....

good that this works for you. but it is supposed to also work with cuda 12 (it does for me with cuda 12.5)

gschramm commented 2 months ago

As far as I know, CUDA is not installed in my user (non-root) directory, and there are no statements in my .bashrc pointing to CUDA. I hope my response is clear enough. btw in my attempts yesterday, I was able to correctly install parallelproj and it displayed CUDA properly during import. I hope this information can help in resolving the issue.

can figure out where cuda is installed on your system and put it on your PATH? E.g. for me, nvcc sits in /usr/local/cuda/bin, so I added this to PATH in my .bashrc

SS99aaNN commented 2 months ago

Are you suggesting that I add the system's CUDA path to the PATH environment variable before proceeding with the installation? Because I couldn't find nvcc in the bin directory of the conda environment. Adding the conda environment's bin directory to bashrc and sourcing it didn't change the outcome; the import still shows this issue.

image
SS99aaNN commented 2 months ago

i tried setting the environment variable PARALLELPROJ_CUDA_LIB with the diretory which libparallelproj_c.so.1.9.0 is in (which should be .../envs/env-name/lib). and it shows another error.

image
gschramm commented 2 months ago

If the "cpu" version of "libparallelproj" ist installed, "libparallelproj_cuda.so" does not get installed, so this won't work. PARALLELPROJ_C_LIB / PARALLELPROJ_CUDA_LIB should contain the absolute path of the present libraries (libparallelproj_c.so, libparallelproj_cuda.so) - not the directory where they are stored - if you want to set them manually.

SS99aaNN commented 2 months ago

i'm currently have no clue and don't know how to resolve this issue. 😮‍💨

SS99aaNN commented 2 months ago

Is it possible that the issue is caused by the version of conda? currently conda version is 23.3.1, and some warnings will show up when packages are installing.

image

Could this issue be causing the error?

gschramm commented 2 months ago

hard to say. You can try to install the latest miniforge from https://github.com/conda-forge/miniforge which btw will also give you mamba (much faster implememtation of conda)

gschramm commented 2 months ago

@SS99aaNN : another thing to try it to use CONDA_OVERRIDE_CUDA as mentioned here: https://conda-forge.org/docs/user/tipsandtricks/#installing-cuda-enabled-packages-like-tensorflow-and-pytorch

the install would then look like:

CONDA_OVERRIDE_CUDA="12.0" conda install "parallelproj==1.9.1=cuda120*" -c conda-forge

To install the cuda 12.0 build

SS99aaNN commented 2 months ago

This works !!!! conda create -n gpu-recon -c conda-forge libparallelproj parallelproj pytorch cupy cudatoolkit=11.8 no idea why, but it did install pytorch and parallelproj in cuda version. Thank you very much for your patience and explanations.

gschramm commented 2 months ago

glad that we could find this work around. closing the issue, but I will update the README