saezlab / liana

LIANA: a LIgand-receptor ANalysis frAmework
https://saezlab.github.io/liana/
GNU General Public License v3.0
160 stars 28 forks source link

Not able to use gpu on HPC #111

Closed kaizen89 closed 1 year ago

kaizen89 commented 1 year ago

Hi, Im trying to run liana_tensor_c2c on HPC using device=cuda:o, however, it seems that cpu is used instead. Here's my Rscript

library(liana)
library(Seurat)
library(parallel)
library(qs)
library(reticulate)
print("libraries loaded")
Bladder_data_sce = qread("data/LIANA_res_Tumor_adj_tensor.qs",nthreads = 4)
print("data loaded")
Bladder_data_sce <- liana_tensor_c2c(sce = Bladder_data_sce,
                        score_col = 'LRscore',
                        rank = NULL,  # set to None to estimate for you data!
                        how='outer',  #  defines how the tensor is built
                        conda_env = "/mnt/beegfs/home/aabbas/miniconda3/envs/cell2cell", #/Users/benjamin/miniconda3/
                        use_available = T, # detect & load cell2cell if available
                        device='cuda:0'
                        )
qsave(Bladder_data_sce_tensor, "data/LIANA_res_Tumor_adj_tensor_gpu.qs", nthreads = 4)
print("task done")

And the Slurm script to execute it:

#!/bin/bash
#SBATCH --job-name=gpu_liana 
#SBATCH --nodes=1
#SBATCH --ntasks=1
#SBATCH --cpus-per-task=4 
#SBATCH --mem=160gb
#SBATCH --time=24:00:00
#SBATCH --partition=dev_gpu
#SBATCH --account=dev_gpu
#SBATCH --gres=gpu:1 # the requested node has 2 gpu cards,  requesting only 1
#SBATCH --output=gpu_abacus_%j.out # Standard output
#SBATCH --error=gpu_abacus_%j.err # Standard error log

source /mnt/beegfs/home/aabbas/.bashrc
conda activate base
echo "conda activated"
echo "will run script"
Rscript scripts/script.liana.R
Attaching SeuratObject
qs 0.25.5
Loading `/mnt/beegfs/home/aabbas/miniconda3/envs/cell2cell` Conda Environment
Building the tensor using LRscore...
^M  0%|          | 0/43 [00:00<?, ?it/s]^M  2%|▏         | 1/43 [05:28<3:50:11, 328.85s/it]^M  5%|▍         | 2/43 [14:56<5:20:43, 469.36s/it]^M  7%|▋         | 3/43 [24:13<5:39:38, 509.45s$

^M  0%|          | 0/25 [00:00<?, ?it/s]^M  4%|▍         | 1/25 [2:54:37<69:51:05, 10477.74s/it]

Slurm output

conda activated
will run script
[1] "libraries loaded"
[1] "data loaded"
[1] 0

srun --jobid=5099174 nvidia-smi

Fri May 19 11:38:26 2023       
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 510.47.03    Driver Version: 510.47.03    CUDA Version: 11.6     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|                               |                      |               MIG M. |
|===============================+======================+======================|
|   0  Tesla V100-PCIE...  On   | 00000000:3B:00.0 Off |                    0 |
| N/A   24C    P0    24W / 250W |     61MiB / 32768MiB |      0%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+

+-----------------------------------------------------------------------------+
| Processes:                                                                  |
|  GPU   GI   CI        PID   Type   Process name                  GPU Memory |
|        ID   ID                                                   Usage      |
|=============================================================================|
|    0   N/A  N/A      4861      G   /usr/bin/X                         60MiB |

Any help would be appreciated. Thanks!

dbdimitrov commented 1 year ago

Hi @kaizen89,

I could not see anything that is necessarily not in place, you could try running it in an interactive session and perhaps seeing if there is any message, or it just starts using CPU?

Would also set use_available to false, and just use it with conda_env as you do.

We set up more in-depth tutorials with related environment setup that might be worth a shot: https://ccc-protocols.readthedocs.io/en/latest/notebooks/ccc_R/QuickStart.html https://github.com/saezlab/ccc_protocols/tree/main/env_setup

Let me know if either helps.

dbdimitrov commented 1 year ago

You could also give GPU acceleration a shot via the tutorial's Python version, as then at least you stay in the same language (reducing complexity a bit).

kaizen89 commented 1 year ago

At first I was not able to install the environment using the env_setup.sh, conda was stuck for too long, after several reinstallations I managed to do it using mamba and the two .yml files you provided. GPU was then running but had other errors related to Cuda out of memory, and setting max_split_size did not help, I guess the dataset was too big. In the end, it's working well with a dataset of 80K cells. Thanks for your help!