Open brooksambrose opened 6 years ago
is there an interest in having this implemented? I wouldn't mind working on this.
Yes, go ahead. I think this is a good difficulty level. I might not be able to answer questions quickly these days, but perhaps someone else will jump in.
Is the task clear @pradkrish ?
Suppose we have a graph where the vertices are divided into two groups, $A = \{ a_1, a_2, \dots \}$ and $B = \{ b_1, b_2, \dots \}$. Suppose edges run only between the two groups, not within groups (or that we only consider these edges). Then we can define a projection to $A$ as a graph with vertex set $A$ where vertices are connected if they have a common neighbour in $B$.
There is already functionality to count how many common neighbours they have.
This issue is to make it possible to combine the attribute of edges between $A$ and $B$ that induce a new edge within $A$. We'll need to think a bit about the best way to do that.
A simple way is to only handle weights (pass as a weight vector), and just add up the weights. It might be a good idea to go for this simple solution at the moment.
A much more flexible solution is to create a multigraphs and copy over all edge attributes, which can be later combined arbitrarily with igraph_simplify()
.
Thus, suppose that these connections exist: a1 - b1 - a2
and a1 - b2 - a2
. We could make each of these four connections induce a single connection in the projection, creating four edges between a1
and a2
. This is still not fully flexible though because it is conceivable that some application required a different attribute combiner for combining a1 - b1
with a2 - b1
vs a1 - b1
with a1 - b2
.
So the first task would be to think carefully about the best interface to use here.
Input/suggestions about this is welcome (@brooksambrose @vtraag @ntamas @GroteGnoom).
Thanks. How about an interface like this:
some_func(...., aggregator1, weights1, aggregator2, weight2)
, where aggregator can be sum
, mean
etc.; weights1
=> weight vector for the edges between A and B, weights2
=> weight vector for the edges between B and A.
In the case of the above example of a1 - b1 - a2
and a1 - b2 - a2
:
some_func(..., 'sum', [1, 1, 1, 1,], 'sum', [1, 1, 1, 1])
will create 4 edges between a1
and a2
and 4 edges between b1
and b2
some_func(..., 'sum', [1, 1, 1, 1,], 'mean', [1, 1, 1, 1])
will create 4 edges between a1
and a2
and one edge between b1
and b2
If we are only talking about undirected edges between the two groups A
and B
, it's enough to just have one weight vector and still have two aggregators, one for each projection.
The typical way in igraph's API to specify how attributes should be treated during an operation that can potentially merge vertices or edges is by passing a const igraph_attribute_combination_t*
object to the function as the last parameter; see igraph_contract_vertices()
and igraph_simplify()
for examples.
An igraph_attribute_combination_t
struct holds multiple "attribute combination records", each record specifying what to do with one specific attribute. There's also the possibility of adding a catch-all attribute combination record that applies to attributes not otherwise specified. See this page in the manual for more details.
Now, as @szhorvat outlined, the thing that we need to think about is how to combine edges. Suppose that we have these paths in the original graph:
a1 - b1 - a2
a1 - b2 - a2
a1 - b3 - a2
In my mind, there are two different types of contractions being involved here. First, we contract edges a1-b1
and b1-a2
into a1-a2
, then a1-b2
and b2-a2
into another instance of a1-a2
, then a1-b3
and b3-a2
into a third instance of a1-a2
. Then, in the next round, we contract these three instances into a final a1-a2
edges. The two types of contractions should be controlled separately, so we would need two const igraph_attribute_combination_t*
arguments, with catchy names that clearly explain how these contractions are performed and why they are different. I'd call them type 1 contractions and type 2 contractions below, but I'm not satisfied with these names, it's only that I could not come up with anything better in two minutes :)
I feel that the function should handle type 1 contractions "manually", inside the function, by writing explicit code for it, but type 2 contractions can be handled by a call to igraph_simplify()
that is performed at the end of the function call. Or, to avoid having two types of attribute combination lists for a single function call, we could leave out type 2 contractions entirely from the domain of this function -- we could simply say that if a type 1 attribute combination list is specified, igraph will create multiple edges if needed, and it is the responsibility of the user to call igraph_simplify()
later on, with the appropriate type 2 attribute combination list.
@szhorvat what do you think?
It sounds reasonable to me to do type 1 in this function and leave type 2 to simplify
. These two types remind me of resistors connected in series or in parallel. We might try to find names based on that.
This operation is becoming sufficiently different from what the function did originally that I'm wondering if we should have a separate function for this.
Thanks. I think the interface is going to be the same regardless of whether the suggested function is going to handle only type1 or both. I can begin working on a function that at least implements type1. Any suggestions for the interface?
Just pinging to check whether you have any suggestions. I have a long weekend coming, happy to start working on this.
So, if the function needs to handle both type 1 and type 2 simplifications, then there is a theoretical possibility that the two could be different, so a single argument would not be enough (although we could make it so that when type 2 is NULL, then it is the same as type 1). But the more I think about it, the more I am of the opinion that type 2 simplifications should remain in the domain of igraph_simplify()
and this function should be dealing only with type 1 simplifications (i.e. merging a pair of A-to-B and B-to-A edges to a single A-to-A edge and combining the attributes there).
If this is the case and we all agree on this, then I guess the way to go would be to add a new igraph_weighted_bipartite_projection()
function, like this:
igraph_error_t igraph_weighted_bipartite_projection(const igraph_t* graph, const igraph_vector_bool_t* types,
igraph_t *proj1, igraph_t *proj2, const igraph_vector_t *weights, igraph_integer_t probe1,
const igraph_attribute_combination_t *comb);
The order of the arguments follows this convention:
graph
and types
)proj1
and proj2
)We should allow proj1
and proj2
to be NULL
in case only one of them is needed. When weights
is NULL
, we can fall back to calling igraph_bipartite_projection()
. Also, comb
can be NULL
, which is equivalent to "drop all edge attributes, I don't care".
Is this enough to get started? Do not hesitate to ask more questions if needed - I'll be around until Friday but I probably won't follow the issue tracker during the weekend.
I had a similar design in mind. This is enough to get started, thanks.
A question. I guess const igraph_attribute_combination_t *comb
in this case will be set by the user. is the example below one way of doing it?
igraph_vector_t weights;
igraph_vector_init_seq(&weights, 1, 10);
SETEANV(&g, "weights", &weights);
igraph_attribute_combination(&comb,
"weights", IGRAPH_ATTRIBUTE_COMBINE_PROD,
"", IGRAPH_ATTRIBUTE_COMBINE_IGNORE,
IGRAPH_NO_MORE_ATTRIBUTES);
igraph_weighted_bipartite_projection(...);
In this example, the edge weights that are contracted will be multiplied.
Yes, this seems correct.
I made an attempt but realised that's not the right approach. I am still not successful in coming up with an igraph
way of solving this :cry: Happy to receive any help with the algorithm design.
Sorry for the delay; happy to jump on this and help out if you are willing to open a draft PR with whatever you have managed to come up with so far. I think the API is good in general, and I can sketch up the internals of the algorithm up to a point where you can feel more confident moving forward.
Just checking in on the status of this, as I was just looking for this exact functionality within R igraph. I did, however, find an implementation of an interesting (Newman's) approach to solving the weight unification within the tnet package: https://toreopsahl.com/tnet/two-mode-networks/projection/
Maybe this helps?
The bottleneck is currently manpower (both for implementing features and reviewing externally contributed implementations). Since this is highly requested, I added it to the 0.11 milestone so that it would not be forgotten. That does not mean a guarantee that it'll be in 0.11 though.
So I was looking into this a bit. It seems like there are two methods described on Tore Opsahl's page above: one that simply sums up the weights of the edges that are being collapsed into a single one during the projection, and one that pre-divides the weights by the degrees of the nodes in the 'other' vertex class (i.e. the class that will eventually get removed during the projection).
For me it seems to me that the latter can be achieved simply by using the pre-divided edge weight vector instead of the raw edge weight vector, so it's not really a different algorithm implementation-wise. We could simply add a const igraph_vector_t* weights
argument to igraph_bipartite_projection
but we won't need a separate method
argument. On the other hand, could there be other methods that people would want to use and that cannot be implemented simply by transforming the weights? If so, we would need a method
argument or multiple functions for bipartite projections.
Another, slightly related question is whether we want to have an igraph_weighted_bipartite_projection()
function (and then we can keep the current function without breaking API, and we can release this even in the next patch release).
I'm not a big fan of API breakages, so if there are no objections, I will move forward with adding a new function with the following signature (i.e. separate function, no method
argument):
igraph_error_t igraph_weighted_bipartite_projection(
const igraph_t* graph, const igraph_vector_bool_t* types, const igraph_vector_t* weights,
igraph_t* proj1, igraph_t* proj2, igraph_vector_t* proj1_weights, igraph_vector_t* proj2_weights,
igraph_integer_t probe1
);
Edit: after reading the page once again, it seems like there is a certain kind of asymmetry between in the definition. If we have nodes of type 1 and type 2, then the weight of the edge from u to v in the projected graph is the sum of all the weights leading from u to p (for all p in type 2) and not from v to p, so in some sense you end up with a directed graph even if the input was undirected. Of course we can create two edges between u and v (in both directions), and then the user could decide how to collapse them into a single edge at a later step.
There's also the problem that igraph_bipartite_projection()
currently ignores the directions of the edges; I don't know whether it's a problem or not and whether there are meaningful real-world problems where it makes sense to construct a directed bipartite graph where edges could go in both directions and then do a projection of it. In case of such a projection, one could actually deal with in-in, in-out, out-in and in-out projections (not even mentioning the possibility of ignoring edge directions, potentially in one side of the graph only) so it gets a bit hairy at that point, and I'm not sure whether it's worth thinking about all these corner cases if they are not used in practice.
Another, slightly related question is whether we want to have an
igraph_weighted_bipartite_projection()
function
I think a separate function is fine. We have separate functions for weighted and unweighted shortest path algorithms. We could still consolidate the two versions into a single functions in a future version, if that seems like a good idea.
Regarding directed graphs:
That seems like a sufficiently different situation that it merits its own separate discussion and its separate function. It reminds me a bit of the idea of the directed clustering coefficient ("transitivity"), which has multiple possible directed generalizations. In a directed bipartite graph, if the first partition has vertices 1
, 2
and the second has vertex A
, then there are four different possible relations between 1
and 2
:
1 -> A -> 2
1 <- A <- 2
1 <- A -> 2
1 -> A <- 2
Cases 3 and 4 are essentially cocitation and bibliographic coupling, respectively. See #2064.
Regarding 1 and 2, which are really the same, this is closely related, but not identical to the concept of graph power. See #2063.
Work has been started on this feature in the feat/weighted-bipartite-projection
branch.
Since we decided that there will be a separate function for this, it is not API-breaking. Deferring this to 1.1 so we can move forward with 1.0 faster, given our limited resources.
The multiplicity argument to bipartite_projection acts as a count of the number of nodes of the first mode that two nodes of the second mode are both connected to, and vice versa. It would be useful to allow functions to aggregate weights associated with those ties to result in a new weight in the single mode projection. Even better would be a new attribute as a list of edge id's of the bipartite graph associated with the two down-mode projections multiplicity so the user could perform any custom operation they like to describe the multi-edge.