Open szhorvat opened 2 years ago
Yes, fair point about the explicit formula's, probably good to include that.
What might be a reasonable default, but still keeping CPM as a default, and without doing some annoying normalisation or something, would perhaps be the following. Usually, in most cases, resolutions of interest are located somewhere around the graph density, so we could take resolution_parameter = edge_density(graph)
as a starting point.
There are two difficulties however with this default. If a user would select objective_function = "modularity"
this default would not make much sense, and the default of 1
would make more sense. Secondly, if weights are supplied, the "weighted density" (i.e. sum(weights)/(n(n - 1)/2)
with n
the number of nodes) should be used as a default instead.
I'm not directly sure how this can best be implemented in R. It seems you can depend on other arguments in the default argument, right? If so, something like
resolution_parameter = switch(objective_function,
CPM=if (weights) weighted_density(graph) else density(graph),
modularity=1)
might do the trick? (Not sure if this type of inline if
works in R)
It seems you can depend on other arguments in the default argument, right?
I think this would work, but you need to write something like ifelse(is.null(weights), density(graph), weighted_density(graph))
. But IMHO weighted_density(graph, weights)
could fall back to the unweighted density if weights
is NULL
or NA
(assuming we have such a function at all).
And of course you need to wrap the whole thing in a giant if (is.missing(resolution_parameter))
to let the user bring their own.
I think this would work, but you need to write something like
ifelse(is.null(weights), density(graph), weighted_density(graph))
. But IMHOweighted_density(graph, weights)
could fall back to the unweighted density ifweights
isNULL
orNA
(assuming we have such a function at all).
Right, thanks!
And of course you need to wrap the whole thing in a giant
if (is.missing(resolution_parameter))
to let the user bring their own.
I believe I can write this simply as the default within the function declaration. That way, if a user supplies its own value it would automatically always use that, right?
Yup, but maybe for the sake of readability it's better to put it in the function body. I leave it up to you.
Can the default resolution_parameter
value not depend on the value of objective_function
?
Another comment: would it be better to use the short resolution
instead of the very long resolution_parameter
? cluster_louvain
uses resolution
.
Can the default
resolution_parameter
value not depend on the value ofobjective_function
?
I'm not sure if I understand this. I propose that it does depend on the value of the objective_function
. By design it has to, the same resolution default do not make sense for both objective functions. What would you propose exactly?
Another comment: would it be better to use the short
resolution
instead of the very longresolution_parameter
?cluster_louvain
usesresolution
.
Yes, that's no problem of course.
What is the "weighted density"?
"weighted density" (i.e. sum(weights)/(n(n - 1)/2) with n the number of nodes)
I've improved the documentation in https://github.com/igraph/rigraph/commit/49f8b1714c97f3519a358763b6039866a0fb1fb4. I haven't yet changed the default settings, so we should still leave this issue open.
Moving this to a future release as the documentation was improved, and changing the default settings would be a potentially breaking change and I'd rather not re-run ~900 revdepchecks now :)
For a small unweighted graphs with degree outliers, like Zachary, an alternative is:
median(degree(g)) / (vcount(g) - 1)
or
quantile(degree(g))[2] / (vcount(g) - 1)
Note: strength(g) works for both weighted and unweighted graphs.
@vtraag Should the default objective function be changed to modularity for 2.0?
I've been seeing repeated complaints on GitHub / StackOverflow that "Leiden doesn't work as well as Louvain / doesn't give similar results / only gives one community". This is likely due to the naive expectation that cluster_leiden
should maximize modularity by default. I know that you have good arguments in favour in CPM, but these go way over the head of the typical biologist ...
@ szhorvat
In my humble opinion as a casual user: probably not.
The problem is that the default resolution parameter is set to 1 as a default value. This gives almost certainly an undesirable partition when the objective function is CPM.
g <- make_graph('zachary'); plot(cluster_leiden(g), g)
My suggestion. If the objective is "CPM" then set the resolution parameter to either:
The latter offers the user an obvious way to try different granularities.
Other suggestion. Copy the input parameters "objective_function" and "resolution_parameter" to the output object, for reason of reproducibility.
The
cluster_leiden()
documentation does not have the formula for the objective function. What is being maximized usingCPM
andmodularity
, respectively?Additionally, since
CPM
is the default, the docs should provide guidance on how to get good results with CPM, i.e. how to set theresolution
. The default value of 1 does not usually lead to good results. As a result, it's almost guaranteed that the results will not be good if using the default settings (a simplecluster_leiden(g)
call). In my mind, this makes CPM a bad default choice unless it is accompanied by an appropriate default forresolution
as well—but this is for @vtraag to decide. I expect that most users won't even read the documentation, and instead will just try the defaults.