Closed clhunsen closed 6 years ago
Okay, we (@sofie-kemper, @Roger1995, @hechtlC, @bockthom) talked about it and here are the results: We should definitely support this, while the default setting is that the directedness of the networks still is identical with the edge-construction methodology.
The edge-construction mechanism is referred to as the temporal ordering of the data (or temporally ordered): who answers to whom? or: shall we respect the temporal order of events/data causing edges? The directedness of the network is referred to directedness (or directed): is the author indicated who caused the event resulting in an edge?
When analyzing preferential attachment, we need to consider the in-degree of each node. For this, we need a directed network! Otherwise, the work of oneself would be added to the in-degree, which is definitely not intended. The question is whether temporal ordering and directedness should be the same here. Our impression is: No!
Consider following order/list of e-mails in the same thread to construct a directed network:
Author | Date (Year) | Artifact |
---|---|---|
Alice | 1999 | 1 |
Bob | 2000 | 1 |
Alice | 2001 | 1 |
Charly | 2002 | 1 |
As we do not distinguish the directedness and the temporal order at the moment, both logical values are identical. As a result, we get a temporally ordered and directed network. The edgelist is the following:
From | To | Date (Year) |
---|---|---|
Bob | Alice | 2000 |
Alice | Bob | 2001 |
Alice | Alice | 2001 |
Charly | Alice | 2002 |
Charly | Bob | 2002 |
When analyzing in-degrees, Charly does not have a good value (it's 0, to be exact), although this does not cover the story of this thread. Charly's contribution may have been huge, but nobody gets that from the current edgelist.
We need to create a temporally unordered, but directed network instead, so that we are able to look at the network of collaborating developers/authors, while we are still able to know who is the source of which edge or caused the event encoded as an edge. This way, Charly's contribution is better respected than before.
The edgelist for such a network is the following, including respecting From
and To
carefully:
From | To | Date (Year) |
---|---|---|
Alice | Bob | 1999 |
Alice | Bob | 2001 |
Bob | Alice | 2000 |
Alice | Charly | 1999 |
Alice | Charly | 2001 |
Charly | Alice | 2002 |
Bob | Charly | 2000 |
Charly | Bob | 2002 |
In the following, we show the schema defining the edge construction for all combinations of temporally (un-)ordered respect of data and (un-)directed networks.
Author | Date (Year) | Artifact/Thread |
---|---|---|
A | 1999 | 1 |
B | 2000 | 1 |
temporally ordered | temporally unordered | |
---|---|---|
network directed | A ←(2000)– B | A –(1999)→ B A ←(2000)– B |
network undirected | A –(2000)– B | A –(1999)– B A –(2000)– B |
The idea is to add two network-configuration parameters, temporally.ordered
and directed
, as a substitute for the current parameters for directedness.
directed.temporal
In the function construct.dependency.network.from.list
, we can remove the argument directed
, which controls the edge-construction mechanism. This can be determined then by getting the temporally.ordered
from the network configuration which is also an argument for the function.
directed.network
When network directedness is enabled, we need to distinguish for each pair/combination of developers (say, A
and B
) under consideration who is the source of an edge.
So, basically, we extract the data items for each developer independently first. Then, we construct edges for each pair (A
, B
), once with all data items from A
and A
as source of the edge, and, finally, the analogous step for B
.
Important: In the end, the directed
parameter determines the directedness of the igraph
object! (see here and here)
As already discussed some time ago, we definitively should support all the different combinations of temporal order and directedness. As it recently became apparent that we need the combinations which are currently not supported yet, I started to work on this.
I've already well thought out how to implement this. I plan to implement it in a different way than suggested in the previous comment. The general idea is to change as few parts of the code as possible. Therefore, I will only slightly change several code parts and will make use of the function igraph::as.undirected
at several places.
For the network-configuration parameters, I suggest to keep the name directed
for configuring whether to construct directed or undirected networks. This parameter will behave as hitherto. Moreover, I suggest to name the second parameter respect.temporal.order
, as this might be more comprehensive and less ambiguous as this parameter does not have anything to do with the directedness of a network. The default value of respect.temporal.order
will be TRUE
in the case of directed=TRUE
and FALSE
if directed=FALSE
, so that we keep the former behavior.
I will work on the implementation probably tomorrow. Stay tuned!
Fixed with PR #137.
Currently, the configured directedness of the networks defines the edge-construction methodology. We definitely need to distinguish here!
When constructing both author and artifact networks can either be directed or undirected. When constructing bipartite networks (where both are unioned to one single network), both need to be directed or undirected.
The edge-construction algorithm defines whether the timely occurrence events does define the networks structure. Edges can be directed or undirected. (Exception: Call-graph data is basically directed, so we need to be careful!)
We need to come up with a proper distinguishment between directedness of networks and the edge-construction algorithm, so that we are able to construct a bipartite network containing the data of, e.g., a time/order-respecting e-mail-based author network (basically directed) and a co-change-based artifact network (undirected) without any problems.
For problems that occur when we do not distinguish, refer to commit 49a9125f35bb08732d3f110d24768a5a19d92036.
[Further information might be added here.]