Closed bockthom closed 7 months ago
I have implemented a simple disambiguation of the relation
attribute here. After executing all tests, I did not find any breaking sections in coronet (as of right now, I did not add any tests myself for this). Especially for the case you mentioned, im not sure why it should break. Manual inspection showed it working just fine 😄
I have implemented a simple disambiguation of the
relation
attribute here.
I've never used modifyList
, but it seems to be helpful :) However, I am not sure if your implementation works as intended in all cases. One characteristic that I forgot mentioning before was sorting: We need to make sure that c("cochange, "issue") and c("issue","cochange") are treated equal, which is why we should sort the unique vectors afterwards, to be on the safe side.
After executing all tests, I did not find any breaking sections in coronet (as of right now, I did not add any tests myself for this). Especially for the case you mentioned, im not sure why it should break. Manual inspection showed it working just fine 😄
Hm, I am not convinced yet. In the case mentioned above, the following line returns different values in different cases:
data.sources = unique(igraph::E(network)$relation)
If every edge represents a single relation, then data.sources
can be the following list:
list("issue", "mail", "cochange")
However, if we simplify multi-relation edges, it can be the following list:
list(c("cochange", "issue"), c("issue"), c("issue","mail"), c("cochange", "issue", "mail"))
In this case, we have four edges with four different unique relations, but this does not resemble the data sources, but a combination of them. I am not sure if the subsequent sapply
call (see the cited code snippet above) can deal with these vectors that consist of multiple elements properly. Please check again what happens in this function in such a case.
Oh by the way, I have fixed that one function affected by the changes to the relation attribute. I forgot that pushes to that branch do not notify anyone of you, but please look at this to oversee the fix. I have started to working on tests for the new simplification and will let you know more details in our next meeting 😄
I had a quick look at the last two commits on the referenced branch. Looks good to me!
Regarding the edge attribute handling, an additional idea comes to my mind now: Wouldn't it be possible to move function relation.handling
directly to the constant EDGE.ATTR.HANDLING
? If this would be possible, then we would make sure that this is handled the same way whenever EDGE.ATTR.HANDLING
is used (wherever this will happen in the future), and we would get rid of the modifyList
call here. Just an idea, not sure if it works, but it is worth a try :smile:
Im not entirely sure, I understand what you mean. Obviously, we cannot move relation.handling
into EDGE.ATTR.HANDLING
's definition, as it is not always the way we want to work. So I assume you mean that at the point of simplifying with the simplify.multiple.relations
parameter set, then we overwrite the default relation handling function with relation.handling
?
If I assumed correctly: I used modifyList
especially because it does not alter the EDGE.ATTR.HANDLING
object but returns a new one, because I did not want to break things by accident. However, I see your point in that it would provide benefit when overriding it at this point. So I checked it out in code and found that even when overriding it by doing EDGE.ATTR.HANDLING$relation = relation.handling
it will not persist. I tried having two consecutive calls to simplify.network
in one test and even though overriding works in both calls individually, it does not persist into the next calls, i.e., in the second call to simplify.network
, EDGE.ATTR.HANDLING
will have returned to its default. I am not sure what R does there and if it can still be done, maybe you know more :).
Obviously, we cannot move
relation.handling
intoEDGE.ATTR.HANDLING
's definition, as it is not always the way we want to work.
Why not? I don't see any situation at which changing the definition of EDGE.ATTR.HANDLING
would break. Calling "sort" and "unique" should not change anything in the previously existing way of simplification. But maybe I overlook something that I don't have in mind - do you have a certain situation in mind in which replacing the definition would break?
Well I supposed when simplifying multi-relational networks the classical way, but upon looking at the code, I think it would retain functionality. Apart from that, EDGE.ATTR.HANDLING
is only ever accessed in one other location:
I am not sure on the effects of changing the relation
attribute in this call.
According to the documentation of igraph::as.undirected
, edge.attr.comb
should only be effective if mode
is "collapse"
or "mutual"
. As "each"
is used in this case, edge.attr.comb
should not have any effect. Even when we would assume that we would use another mode
here, I don't see any reason why the relation in artifact networks should be handled differently than in author networks.
In PR #250, we introduce a new possibility for simplifying multi-relation edges. However, there are no tests yet for this new feature, and also the edge-attribute handling still needs to be adjusted accordingly. Hence, in this issue, I describe the missing steps that should be approached after PR #250 is merged:
Let me start with an example: Assume that we have an author network containing two vertices A and B, which has multiple relations, namely "mail" and "cochange". Assume that there are three cochange edges and four mail edges between A and B. Then the default simplification reduced the three cochange edges to one cochange edge, and the four mail edges to one mail edge. The new "simplify.multiple.relations" parameter instead simplifies the three cochange and four mail at edges to one single edges that represents all mail and cochange edges together.
Problem: Edge attributes need to be handled properly for the new case. The edge-attribute handling for simplification is specified here: https://github.com/se-sic/coronet/blob/a0c3a5262a8837679e2c04343e10b2e41dcab4df/util-networks.R#L55-L70
Currently, we only take the "first" relation - which was not a problem, was only edges of the same relation were simplified to one edge. Now, when we simplify edges of different relations to the same edge, "first" is wrong, as we need a unique vector of all relations instead of just taking the first one. Unfortunately, the edge-attribute handling of igraph does not provide a dedicated function for that. However, igraph allows to provide a custom function that is applied when handling edge attributes during simplification. Hence, such a custom function is necessary for relation here.
However, changing the edge-attribute handling will also break some other parts of coronet, which assume that the edge attribute relation are only a single element - but now it also can be a vector. For example, this might break the following lines, in which the data sources are determined from the relations - this might break if there are vectors of relations instead of single elements: https://github.com/se-sic/coronet/blob/a0c3a5262a8837679e2c04343e10b2e41dcab4df/util-networks.R#L1673-L1689
It is important, that the new functionality is backwards compatible, this is, if the default simplification procedure is used, then everything should work as before, and only if there are multiple relations combined into a single edge, we need to find a way that all the functions are also able to deal with a vector of relations instead of a single edge.
In addition, PR #250 also lacks of tests. So proper tests should also be added :wink: