vtraag / leidenalg

Implementation of the Leiden algorithm for various quality functions to be used with igraph in Python.
GNU General Public License v3.0
566 stars 76 forks source link

Problems installing leidenalg on a remote cluster #161

Closed markrbower closed 3 months ago

markrbower commented 6 months ago

$ R

library("reticulate") py_install("python-igraph") py_install("leidenalg", forge = TRUE)

library(igraph) library(leiden)

All requested packages already installed.

python modules igraph and leidenalg installed

library(leidenalg) Error in library(leidenalg) : there is no package called ‘leidenalg’
library(leidenAlg)
Error in library(leidenAlg) : there is no package called ‘leidenAlg’

vtraag commented 6 months ago

First of all, this is the repository about the C library underlying the Python package leidenalg. For that reason, I'm transferring this issue to the leidenalg repository.

The question is not entirely clear to me. It seems you are installing the Python packages leidenalg and python-igraph (note that the latter should be updated to igraph instead). However, you are then trying to load the leidenalg or leidenAlg package in R. Please note that those are two separate things. There is a leidenAlg package in R (see https://github.com/kharchenkolab/leidenAlg), and this leidenalg package which is Python only. Hence, if you want the leidenAlg package, please refer to installing that package in R.

markrbower commented 6 months ago

Thank you for the help! At the "kharchenkolab" link above, I found two new things (to me): First, I found the devtools::install_github command string for "leidenAlg", but when I ran it on a remote cluster (where I don't have root/admin privileges) it came back with errors ending in:

ERROR: dependency ‘uwot’ is not available for package ‘sccore’                                                              
* removing ‘/vast/palmer/home.mccleary/bm662/R/x86_64-pc-linux-gnu-library/4.2/sccore’                                      

The downloaded source packages are in                                                                                       
        ‘/tmp/RtmpKPhaqq/downloaded_packages’                                                                               
✔  checking for file ‘/tmp/RtmpKPhaqq/remotes8b7322f2e9f9/kharchenkolab-leidenAlg-2c8d53f/DESCRIPTION’ ...
─  preparing ‘leidenAlg’:
✔  checking DESCRIPTION meta-information ...
─  cleaning src
─  checking for LF line-endings in source and make files and shell scripts
─  checking for empty or unneeded directories
─  building ‘leidenAlg_1.1.2.tar.gz’

Installing package into ‘/vast/palmer/home.mccleary/bm662/R/x86_64-pc-linux-gnu-library/4.2’
(as ‘lib’ is unspecified)
ERROR: dependency ‘sccore’ is not available for package ‘leidenAlg’
* removing ‘/vast/palmer/home.mccleary/bm662/R/x86_64-pc-linux-gnu-library/4.2/leidenAlg’
Warning messages:
1: In i.p(...) :
  installation of package ‘RcppEigen’ had non-zero exit status
2: In i.p(...) : installation of package ‘uwot’ had non-zero exit status
3: In i.p(...) : installation of package ‘igraph’ had non-zero exit status
4: In i.p(...) : installation of package ‘sccore’ had non-zero exit status
5: In i.p(...) :
  installation of package ‘/tmp/RtmpKPhaqq/file8b73268e424f5/leidenAlg_1.1.2.tar.gz’ had non-zero exit status
>

and I still cannot run the R command "leiden".

Second, I found a notice the igraph now contains "cluster_leiden" has been included in "igraph" since 2020. When I ran it, however, I got different results than "leiden" and they don't look as good:

LOCAL

> library(igraph)
> library(leiden)
> library(leidenAlg)
> g <- make_graph('Zachary')
> leiden(g)
 [1] 2 2 2 2 4 4 4 2 1 1 4 2 2 2 1 1 4 2 1 2 1 2 1 3 3 3 1 3 3 1 1 3 1 1
>

REMOTE CLUSTER

> library(igraph)
> g <- make_graph('Zachary')
> cluster_leiden(g)$membership
 [1]  1  2  3  4  5  6  7  8  9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25
[26] 26 27 28 29 30 31 32 33 34
> cluster_leiden(g,resolution_parameter=.05)$membership
 [1] 1 1 1 1 1 1 1 1 2 1 1 1 1 1 2 2 1 1 2 1 2 1 2 2 2 2 2 2 2 2 2 2 2 2
> cluster_leiden(g,resolution_parameter=.1)$membership
 [1] 1 1 1 1 2 2 2 1 3 4 2 5 1 1 3 3 2 1 3 1 3 1 3 3 6 6 3 3 6 3 3 6 3 3
> 

This seems different than just different starting random seeds. "cluster_leiden" appears to be far more sensitive to the value of the resolution_parameter. Are "leiden" and "cluster_leiden" different algorithms?

markrbower commented 6 months ago

I figured out the problem on the first issue (errors installing "leidenAlg" using dev tools). I was trying to do some from a "login node" to the cluster, which doesn't have enough memory. After moving to a "compute node", the installation went through without a problem.

When I try to run "leiden" in R, though, I get:

> library(igraph)
> library(leiden)
...
==> WARNING: A newer version of conda exists. <==
    current version: 23.1.0
    latest version: 23.11.0

Please update conda by running

    $ conda update -n base -c conda-forge conda

# All requested packages already installed.

conda-forge/linux-64                                        Using cache
conda-forge/noarch                                          Using cache
bioconda/linux-64                                           Using cache
bioconda/noarch                                             Using cache
pkgs/main/linux-64                                          Using cache
pkgs/main/noarch                                            Using cache
pkgs/r/linux-64                                             Using cache
pkgs/r/noarch                                               Using cache
Collect all metadata (repodata.json): done
Solving environment: done

==> WARNING: A newer version of conda exists. <==
    current version: 23.1.0
    latest version: 23.11.0

Please update conda by running

    $ conda update -n base -c conda-forge conda

# All requested packages already installed.

python modules igraph and leidenalg installed
> leiden(g)
Error in py_module_import(module, convert = convert) :
  ModuleNotFoundError: No module named 'leidenalg'
Run `reticulate::py_last_error()` for details.
>

Why does it say that "leidenalg" is installed, but then says "No module named 'leidenalg'"?

vtraag commented 3 months ago

it came back with errors

The reported error is

ERROR: dependency ‘sccore’ is not available for package ‘leidenAlg’

so it seems that you have to find a way to install sccore.

After moving to a "compute node", the installation went through without a problem.

But it seems you already solved that problem.

This seems different than just different starting random seeds. "cluster_leiden" appears to be far more sensitive to the value of the resolution_parameter. Are "leiden" and "cluster_leiden" different algorithms?

igraph cluster_leiden uses CPM by default, while leidenAlg seems to default to modularity based optimisation.

Why does it say that "leidenalg" is installed, but then says "No module named 'leidenalg'"?

I don't know exactly unfortunately. It seems that it is somehow not picking up the right environments. Note however that the leiden R package is using reticulate to use the Python package from R. The leidenAlg package does not use reticulate, so you should have fewer problems with python environments, etc... In short: do not use library(leiden), but use library(leidenAlg).