Open ntamas opened 2 years ago
This issue has been automatically marked as stale because it has not had recent activity. It will be closed in 14 days if no further activity occurs. Thank you for your contributions.
Hello! I would like to contribute to igraph and noticed that this is labeled as a good first issue, which I am hunting for, since it will be my very first contribution. Is this still open and if so, is everything I need to get starrted in the wiki?
Yes, the issue is still open and feel free to get started on it. There's no wiki page but the README contains some instructions for getting started with development (after checking out the repo and all its submodules); see here.
I'd suggest that you open a PR with your draft as soon as possible (and mark it as draft) and commit often so we can give you early feedback if needed.
As for this specific task, you can use my SO response as a starting point, but it needs to be made a proper method of the VertexClustering
class. I'd probably add a new filter_by_size()
method to the VertexClustering
class.
Wouldn't it make sense to try to design some sort of generalized filtering mechanism instead of a one-trick method that only handles sizes?
What if:
V
, representing areas in the visual cortex within a brain network.VertexCover
, not VertexClustering
Generally speaking, the area where most open source projects don't do so well is having a coherent, predictable, well thought out design. If you work with a system like Mathematica often enough, this becomes painfully obvious. I would like igraph to stand out from the crowd by being better in this area.
Wouldn't it make sense to try to design some sort of generalized filtering mechanism instead of a one-trick method that only handles sizes?
I was considering this, but I don't want to overstretch the scope of this issue. It's always possible to add such a filtering method later and then rewrite filter_by_size()
based on this. One advantage of adding a separate method for size (which is the most common filtering that people want to use) is that you don't actually need to construct the subgraph.
That being said, I can imagine adding a filter()
method to VertexClustering
that takes a Boolean predicate. The predicate would be a Python callable that takes the list of vertices in the cluster and returns True/False, but in that case it would be the job of the predicate to construct the subgraph on-the-fly if needed.
Alright, that makes sense. But it's still a useful exercise to think about how a generalization could be formulated and then implemented. Creating the subgraph seems like a heavyweight operation. Would it make sense to add a helper function to the C core which can retrieve the IDs of edges contained within a subgraph induced by some vertices? Then it will become relatively easy to compute things like:
More generally, it might make sense for a filtering method to have access to:
The first is trivial, as the clusters are generally understood in terms of their vertices and API provides access to these through .as_cover()
. There does not seem to be a simple way for the second. Am I missing something? If not, do you agree that it makes sense to add the support function to the C core, or at least discuss these ideas further somewhere else (forum?)
Would it make sense to add a helper function to the C core which can retrieve the IDs of edges contained within a subgraph induced by some vertices?
Sounds like a good idea.
More generally, it might make sense for a filtering method to have access to:
I think that all that the filtering function needs access to is the graph object itself and the members of the current cluster being considered. The graph might not even be needed as you can always add it using functools.partial
, e.g.:
def some_predicate(graph, vertices):
# do something and return True/False here
clusters.filter(partial(some_predicate, graph))
This would allow .filter()
to be implemented in the Clustering
base class where we don't already have a graph.
But again, I still maintain that this is outside the scope of the current issue, which only asks for filtering based on cluster sizes, so in order not to derail the current discussion, I'd recommend opening a separate issue for a more generic filtering method.
Is anyone still working on this issue currently?
What is the feature or improvement you would like to see?
It is a common requirement when clustering a large graph to plot only the larger communities but not the smaller ones, as illustrated by this SO question.
Maybe it would be even more flexible if we allowed the user to pass an upper and a lower size limit, or an aribtrary filtering predicate based on the sizes.
I've posted a quick sketch of a possible implementation in the SO question.
Use cases for the feature
See above; the SO question provides a possible use-case.
References
N/A