Open clhunsen opened 6 years ago
I've come to look at this issue. After inspection, I find that the described issue is indeed present, i.e., we do indeed need to perform the described steps to simplify a network properly. Im unsure what the performance impact of this convolution is though.
Towards the proposed solutions: I don't think we can use "unique"
in the EDGE.ATTR.HANDLING
, for the reason that I am just unsure what the expected outcome would be. Same goes for the sqldf
suggestion. The underlying issue is that the igraph::simplify
function (that we use) has no notion of grouping over certain attributes. Therefore, we have to pre-group ourselves. In what way we perform pre-grouping can only have minor impacts on performance and readability I suppose.
Possible solutions I see are: Reimplementing the igraph::simplify
function with a notion of grouping. This approach may not be a fan favorite 😂. Then, we could also go to the drawing board and figure out how to include grouping into custom lambda functions that we then set as values into the EDGE.ATTR.HANDLING
. This may work, but I have not yet thought about the specifics.
Preamble: In our last meeting, we discussed, that any form of performing simplification on the edge list directly involves us reimplementing igraph
s simplification logic, which as enticing it may sound for the performing of our code, is definitely not something we want to do. Nevertheless, we do want to reduce the amount of times, where we have to obtain, edit, set, obtain, edit, set ... the edge lists of the individual networks. A good step into that direction is to replace the call to merge.networks
with something implemented by the igraph
people, in the hope that they may implemented it in a more performant way.
So I have been searching the docs for a function to merge networks natively. igraph::union
does not do what we want, as it creates duplicate columns in the edges for every column that appears multiple times in the input networks, i.e., the unified network would have columns type_1
, type_2
, type_3
... kind_1
, kind_2
, kind_3
, etc. which would require us to once again iterate the edge list and merge together these artifacts. However, I found that the igraph::disjoint_union
function does what we need. The function assumes that input nets are disjoint and creates an output network in which there are copies of every vertex of all input nets with a single nicely formated edgelist. This means in our case, that if we merge 4 networks and all have these vertices A, B, C
, then the disjoint union will have vertices A, B, C, A, B, C, A, B, C, A, B, C
but the edge list will appear "correctly". We can then contract all vertices that are effectively equivalent through igraph::contract
and then achieve the same output as with merge.networks
(all tests ☑️).
I am not here to judge for the efficiency of this solution. Under the assumption, that the igraph people did a nice job in the implementation of disjoint_union
and contract
, then we may achieve a significant benefit, otherwise, Im not sure. Also I don't have the big testing data on me to evaluate performance. Nevertheless, this solution does reduces code duplication and would make merge.networks
and merge.network.data
mostly obsolete.
My code (util-network.R line 1720ff
):
## merge the simplified networks
network = igraph::disjoint_union(networks)
vertex.count = igraph::vcount(networks[[1]])
network = igraph::contract(network,
rep(1:vertex.count, length(networks)),
vertex.attr.comb = "first")
Description
Currently, the code in the function
simplify.network()
is quite complex: To preserve an edge for each existing relation, we need to obtain the complete edge list, split it, construct networks from the separate different edge lists again, simplify these, and merge the separate networks in the end (by obtaining their edge lists...).We need to find a way to apply the
EDGE.ATTR.HANDLING
on the raw edge lists (i.e., data.frames). Maybe, using the existing list when constructing an appropriatesqldf
statement is a worth a try. Also, trying to use"unique"
as a value in theEDGE.ATTR.HANDLING
list may solve the issue right away. :wink:https://github.com/se-passau/codeface-extraction-r/blob/2241817620505a396846da1d2afa0264afafb55a/util-networks.R#L1192-L1233
https://github.com/se-passau/codeface-extraction-r/blob/2241817620505a396846da1d2afa0264afafb55a/util-networks.R#L44-L63
Versions
This affects the upcoming version
v3.2
as it contains the code of PR #115.