Closed VPetukhov closed 2 years ago
Hi @VPetukhov
I'm getting these errors elsewhere too; in discussions with the CRAN crew, as I cannot update conos.
I don't understand it---one possible idea is that there's been an update to igraph
, which is causing compilation errors.
https://cran.r-project.org/web/checks/check_results_leidenAlg.html
Could you doublecheck that this bug exists on previous package versions, e.g. 1.00, 1.0.1?
I think this version 1.30.0 is causing problems: https://cran.r-project.org/web/packages/igraph/index.html
Just updated today on CRAN
Thanks, Evan! As they say, igraph has leiden built-in now. So should we switch leidenAlg back-end to R igraph and remove the c++ part completely? Maybe also deprecate the repo and move rleiden to sccore.
We just need to keep in mind that cluster_leiden
has different parameter naming than leiden.community
: resolution_param
instead of resolution
and n_iterations
instead of n.iterations
. So given how many snippets there are in the world, we can't just replace one with another.
Look for cluster_leiden()
in igraph 1.3.0; I'm not sure how this implementation compares to @vtraag's original C++ implementation, though.
The implementation in igraph
is more limited than the full leidenalg
package, it supports a more limited number of quality functions, does not support multiplex community detection and does not support directed graphs. For many purposes that will probably do, but if you want all of these things, then you will still need to use the leidenalg
package.
Updating the igraph headers appears to have worked. I can simply version the package.
@vtraag I've added some details in the README. Does this sit well with you? Are there any other details you'd prefer? I'm trying to advertise that this function now exists in igraph
and that you wrote it.
RE: retiring the package
Agreed, there's still value here. I also don't want to deal with the headache of rewriting everything on our end.
People do use this package---others have found value in it, e.g. https://github.com/kharchenkolab/leidenAlg/issues/2
At some point, I was discussing with Vincent about improving the functionality of this package but....there's just not enough time on my end. (Sorry about that, Vincent!)
Hi Evan. Viktor encouraged me to let you know that for me using R-4.1.2 on RHEL8 none of the current branches does the trick - I still get leidenAlg.so: undefined symbol: igraph_rngtype_mt19937.
In the context of making the transition as easy as possible, I am wondering about which features of @vtraag's leidenalg code does this package expose that are not provided by cluster_leiden
in igraph 1.3? As far as I can tell, everything is exposed through find_partition
, but this function only accepts a graph, edge weights, a resolution, and number of iterations. It does not seem to provide any choice of quality functions, customization through vertex weights, or any of the other things Vincent mentions:
The implementation in
igraph
is more limited than the fullleidenalg
package, it supports a more limited number of quality functions, does not support multiplex community detection and does not support directed graphs.
So it seems the only thing provided here that's not in igraph is support for directed graphs? Otherwise, cluster_leiden
is more flexible than find_partition
.
Do any of the downstream users rely on support for directed graphs? If not, would it be an easier path for you @evanbiederstedt to just re-implement find_partition
in terms of cluster_leiden
(since it really provides only a subset of cluster_leiden
's functionality)?
Am I missing anything here?
Agree with @szhorvat here. I'm not aware of any usages for directed graphs: the only dependency on CRAN is conos and we don't use it for scRNA-seq analysis in the lab.
@evanbiederstedt , I created a brunch no_cpp
where I replaced all cpp code with calls to cluster_leiden
. Please, feel free to merge it to main
if you agree with the idea. Otherwise, we can have it as a temporary workaround to install leidenAlg till the main issue is fixed.
@rrydbirk , please try devtools::install_github('kharchenkolab/leidenAlg', ref='no_cpp')
.
@VPetukhov, leidenAlg/no_cpp and conos installed W/O problems.
Hm, somehow clustering with cluster_leiden
and objective_function="CPM"
(default) provides really bad results: I get either 2 or 200+ clusters. But when setting objective_function="modularity"
, everything gets back to normal. @rrydbirk , be aware of the problem.
As setting objective_function
by hands every time is annoying, maybe we shouldn't deprecate leidenAlg::leiden.community
after all... I changed this default in the last commit (https://github.com/kharchenkolab/leidenAlg/commit/86e43e1fc37f91fe381ef675dd8df9c9beb87f45).
For reference, I tested it on a mutual-Nearest-Neighbors graph with 26597 vertices and 400759 edges.
@vtraag Is this expected? (see above). What was the default in the original implementation that leidenAlg
used? If it was modularity
, shouldn't the default be changed to modularity
instead? (I would claim that this is essentially a bugfix and not a breaking change).
Is CPM
the default on purpose? That's a strange choice. I agree that changing that would be a bugfix.
Sorry, I'm only getting back to all of this now.
somehow clustering with
cluster_leiden
andobjective_function="CPM"
(default) provides really bad results: I get either 2 or 200+ clusters.
This is presumably because you don't pick a resolution_parameter
different from the default, which is 1
by default, which for unweighted graphs means that it will find a singleton partition. This is a concious choice, and aims to clarify that people should make a choice here.
What was the default in the original implementation that
leidenAlg
used? If it wasmodularity
, shouldn't the default be changed tomodularity
instead? (I would claim that this is essentially a bugfix and not a breaking change). IsCPM
the default on purpose? That's a strange choice. I agree that changing that would be a bugfix.
In my (Python) leidenalg
package, there is no default quality function, you should always specify it. In the R package leidenAlg
here, only modularity seems to have been implemented, none of the other quality functions seem to have been made available. Given these limited implementations, I think any dependencies can more easily just rely on the cluster_leiden
function directly, and I would deprecate this R leidenAlg
package. In the cluster_leiden
function in igraph
itself, I deliberately chose CPM as the default, not modularity. I believe CPM to be superior to modularity in many respects, and this is why I have used CPM as the default, and I would not advocate for changing this.
I believe CPM to be superior to modularity in many respects, and this is why I have used CPM as the default, and I would not advocate for changing this.
IMO the problem with this approach is that it assumes that people who use the function have a good understanding how it works. That's almost never the case. I expect that the typical user will try all community detection functions, one-by-one, to see which gives a good result with default settings. The defaults for this one are CPM with resolution=1, so bad results seem to be almost guaranteed?
It would make sense to choose a default which works reasonably well for most networks.
@evanbiederstedt @VPetukhov just to confirm: did you manage to work around the symbol visibility issue in leidenAlg
? We are preparing a CRAN submission of 1.3.1 now and it looks like there's no need to revert the symbol visibility settings on our end as you managed to sort things out; is that correct?
Whatever you decide here @ntamas
We'll jump through your hoops
revdep checks on my machine did not indicate any breakages with leidenAlg
when comparing igraph 1.3.0 with the current dev
branch so hopefully there will be no more hoops this time. Fingers crossed.
Thanks @ntamas, I really do appreciate the help getting this resolved quickly.
We'll get this working together some way or another :) If not hoops, then maybe hurdles next time :)
Best, Evan
Hi @evanbiederstedt ,
I'm trying to install leidenAlg on R-4.1.2 (libxml2, glpk and gmp are installed). Compilation goes fine, but the last step returns
The same result goes on my local laptop with Gentoo and on a server with RedHat 8. Do you have any idea how to debug this?
If I run
devtools::load_all()
, I can check the compilation files. Then,nm -D /tmp/Rtmpxafa69/pkgload32db541a1f34/leidenAlg.so
shows that there is indeed a symbolU igraph_rngtype_mt19937
. And output ofldd /tmp/Rtmpxafa69/pkgload32db541a1f34/leidenAlg.so
is the following:I tried to randomly tweak Makevars with no success.