i4Ds / Karabo-Pipeline

The Karabo Pipeline can be used as Digital Twin for SKA
https://i4ds.github.io/Karabo-Pipeline/
MIT License
11 stars 4 forks source link

Karabo-Pipline not running with CUDA Version 12 #568

Closed anawas closed 4 months ago

anawas commented 4 months ago

I installed Karabo following the instruction found in the documentation. Running the interferometer simulation from the example script produced the following output:

WARNING: AstropyDeprecationWarning: The private astropy._erfa module has been made into its own package, pyerfa, which is a dependency of astropy and can be imported directly using "import erfa" [astropy._erfa]
The RASCIL data directory is not available - continuing but any simulations will fail
Parameter 'use_gpus' is None! Using function 'karabo.util.gpu_util.is_cuda_available()'. To overwrite, set 'use_gpus' True or False.
Creating /tmp/karabo-STM-andreas-4S0i7QquzI/interferometer-sDn3r043oU for interferometer disk-cache.
Traceback (most recent call last):
  File "/home/andreas/karabo-test.py", line 40, in <module>
    simulation.run_simulation(telescope, sky, observation)
  File "/home/andreas/miniconda3/envs/karabo_cuda_12/lib/python3.9/site-packages/karabo/simulation/interferometer.py", line 356, in run_simulation
    return self.__setup_run_simulation_oskar(
  File "/home/andreas/miniconda3/envs/karabo_cuda_12/lib/python3.9/site-packages/karabo/simulation/interferometer.py", line 487, in __setup_run_simulation_oskar
    params_total = InterferometerSimulation.__run_simulation_oskar(
  File "/home/andreas/miniconda3/envs/karabo_cuda_12/lib/python3.9/site-packages/karabo/simulation/interferometer.py", line 510, in __run_simulation_oskar
    setting_tree = oskar.SettingsTree("oskar_sim_interferometer")
  File "/home/andreas/miniconda3/envs/karabo_cuda_12/lib/python3.9/site-packages/oskar/settings_tree.py", line 138, in __init__
    raise RuntimeError("OSKAR library not found.")
RuntimeError: OSKAR library not found.

Steps to reproduce

  1. Install Karabo following the instructions in documentation
  2. Enter example script from "Running an interferometer simulation"
  3. Run script with `python .py

Configuration

  1. uname -a: Linux awi4dsomen 5.15.133.1-microsoft-standard-WSL2 #1 SMP Thu Oct 5 21:02:42 UTC 2023 x86_64 GNU/Linux
  2. nvidia-smi:
    
    +---------------------------------------------------------------------------------------+
    | NVIDIA-SMI 545.46                 Driver Version: 546.80       CUDA Version: 12.3     |
    |-----------------------------------------+----------------------+----------------------+
    | GPU  Name                 Persistence-M | Bus-Id        Disp.A | Volatile Uncorr. ECC |
    | Fan  Temp   Perf          Pwr:Usage/Cap |         Memory-Usage | GPU-Util  Compute M. |
    |                                         |                      |               MIG M. |
    |=========================================+======================+======================|
    |   0  NVIDIA GeForce RTX 4070 ...    On  | 00000000:01:00.0  On |                  N/A |
    | N/A   42C    P8               4W / 130W |   1433MiB /  8188MiB |      4%      Default |
    |                                         |                      |                  N/A |
    +-----------------------------------------+----------------------+----------------------+

+---------------------------------------------------------------------------------------+ | Processes: | | GPU GI CI PID Type Process name GPU Memory | | ID ID Usage | |=======================================================================================| | No running processes found | +---------------------------------------------------------------------------------------+

anawas commented 4 months ago

I dug a bit deeper and found, that on my system libcudart.so.11.0 is missing.

Steps to reproduce

  1. Change to path where package oskar is installed, e.g. /home/<user>/miniconda3/envs/karabo_cuda_12/lib/python3.9/site-packages/oskar/
  2. Open Python shell: python
  3. Type import _apps_lib On my machine I get the output
    Traceback (most recent call last):
    File "<stdin>", line 1, in <module>
    ImportError: libcudart.so.11.0: cannot open shared object file: No such file or directory
anawas commented 4 months ago

Workaround

Here's what fixed the issue for me: The package cuda-cudart is in version 12.4 instead of 11.7. Check it with

$ conda list cuda
# packages in environment at /home/<user>/miniconda3/envs/karabo:
#
# Name                    Version                   Build  Channel
cuda-crt-tools            12.4.131             ha770c72_1    conda-forge
cuda-cudart               12.4.127             he02047a_2    conda-forge
cuda-cudart_linux-64      12.4.127             h85509e4_2    conda-forge
cuda-nvcc-tools           12.4.131             hd3aeb46_1    conda-forge
cuda-nvrtc                11.7.50              hd0285e0_0    nvidia/label/cuda-11.7.0
cuda-nvtx                 12.4.127             he02047a_2    conda-forge
cuda-nvvm-tools           12.4.131             h59595ed_1    conda-forge
cuda-version              12.4                 h3060b56_3    conda-forge
ska-gridder-nifty-cuda    0.3.0            py39h76be34b_0    i4ds

If this happens (like on my computer) remove the cuda-cudart package with conda remove cuda-cudart

This will uninstall all package in the environment. Now you can install version 11.7 from the nvidia channel: conda install -c nvidia/label/cuda-11.7.0 cuda-cudart

Then you install the rest conda install -c nvidia/label/cuda-11.7.0 -c i4ds -c conda-forge karabo-pipeline

Now you'll have the correct version. Check it with

# packages in environment at /home/<user>/miniconda3/envs/karabo:
#
# Name                    Version                   Build  Channel
cuda-cudart               11.7.60              h9538e0e_0    nvidia/label/cuda-11.7.0
cuda-version              11.8                 h70ddcb2_3    conda-forge
cudatoolkit               11.8.0              h4ba93d1_13    conda-forge
ska-gridder-nifty-cuda    0.3.0            py39h76be34b_0    i4ds

Now run the example file provided in the documentation. It should run successfully and print:

The RASCIL data directory is not available - continuing but any simulations will fail
Parameter 'use_gpus' is None! Using function 'karabo.util.gpu_util.is_cuda_available()'. To overwrite, set 'use_gpus' True or False.
Creating /tmp/karabo-STM-andreas-4S0i7QquzI/interferometer-FDTET8f00l for interferometer disk-cache.
Saved visibility to /tmp/karabo-STM-4S0i7QquzI/interferometer-FDTET8f00l/visibility.vis
Lukas113 commented 4 months ago

Closed by PR #573