se-sic / coronet

coronet – the R library for configurable and reproducible construction of developer networks
GNU General Public License v2.0
7 stars 15 forks source link

Distinguish directedness of networks and edge-construction algorithm #6

Closed clhunsen closed 6 years ago

clhunsen commented 7 years ago

Currently, the configured directedness of the networks defines the edge-construction methodology. We definitely need to distinguish here!

When constructing both author and artifact networks can either be directed or undirected. When constructing bipartite networks (where both are unioned to one single network), both need to be directed or undirected.

The edge-construction algorithm defines whether the timely occurrence events does define the networks structure. Edges can be directed or undirected. (Exception: Call-graph data is basically directed, so we need to be careful!)

We need to come up with a proper distinguishment between directedness of networks and the edge-construction algorithm, so that we are able to construct a bipartite network containing the data of, e.g., a time/order-respecting e-mail-based author network (basically directed) and a co-change-based artifact network (undirected) without any problems.

For problems that occur when we do not distinguish, refer to commit 49a9125f35bb08732d3f110d24768a5a19d92036.

[Further information might be added here.]

clhunsen commented 7 years ago

Okay, we (@sofie-kemper, @Roger1995, @hechtlC, @bockthom) talked about it and here are the results: We should definitely support this, while the default setting is that the directedness of the networks still is identical with the edge-construction methodology.

Important note

The edge-construction mechanism is referred to as the temporal ordering of the data (or temporally ordered): who answers to whom? or: shall we respect the temporal order of events/data causing edges? The directedness of the network is referred to directedness (or directed): is the author indicated who caused the event resulting in an edge?

Reasoning and example

When analyzing preferential attachment, we need to consider the in-degree of each node. For this, we need a directed network! Otherwise, the work of oneself would be added to the in-degree, which is definitely not intended. The question is whether temporal ordering and directedness should be the same here. Our impression is: No!

Consider following order/list of e-mails in the same thread to construct a directed network:

Author Date (Year) Artifact
Alice 1999 1
Bob 2000 1
Alice 2001 1
Charly 2002 1

Current result

As we do not distinguish the directedness and the temporal order at the moment, both logical values are identical. As a result, we get a temporally ordered and directed network. The edgelist is the following:

From To Date (Year)
Bob Alice 2000
Alice Bob 2001
Alice Alice 2001
Charly Alice 2002
Charly Bob 2002

When analyzing in-degrees, Charly does not have a good value (it's 0, to be exact), although this does not cover the story of this thread. Charly's contribution may have been huge, but nobody gets that from the current edgelist.

Wanted result

We need to create a temporally unordered, but directed network instead, so that we are able to look at the network of collaborating developers/authors, while we are still able to know who is the source of which edge or caused the event encoded as an edge. This way, Charly's contribution is better respected than before.

The edgelist for such a network is the following, including respecting From and To carefully:

From To Date (Year)
Alice Bob 1999
Alice Bob 2001
Bob Alice 2000
Alice Charly 1999
Alice Charly 2001
Charly Alice 2002
Bob Charly 2000
Charly Bob 2002

Schema

In the following, we show the schema defining the edge construction for all combinations of temporally (un-)ordered respect of data and (un-)directed networks.

Raw data

Author Date (Year) Artifact/Thread
A 1999 1
B 2000 1

Networks

temporally ordered temporally unordered
network directed A ←(2000)– B A –(1999)→ B
A ←(2000)– B
network undirected A –(2000)– B A –(1999)– B
A –(2000)– B

Implementation

The idea is to add two network-configuration parameters, temporally.ordered and directed, as a substitute for the current parameters for directedness.

directed.temporal

In the function construct.dependency.network.from.list, we can remove the argument directed, which controls the edge-construction mechanism. This can be determined then by getting the temporally.ordered from the network configuration which is also an argument for the function.

directed.network

When network directedness is enabled, we need to distinguish for each pair/combination of developers (say, A and B) under consideration who is the source of an edge. So, basically, we extract the data items for each developer independently first. Then, we construct edges for each pair (A, B), once with all data items from A and A as source of the edge, and, finally, the analogous step for B.

Important: In the end, the directed parameter determines the directedness of the igraph object! (see here and here)


Open questions

bockthom commented 6 years ago

As already discussed some time ago, we definitively should support all the different combinations of temporal order and directedness. As it recently became apparent that we need the combinations which are currently not supported yet, I started to work on this.

I've already well thought out how to implement this. I plan to implement it in a different way than suggested in the previous comment. The general idea is to change as few parts of the code as possible. Therefore, I will only slightly change several code parts and will make use of the function igraph::as.undirected at several places.

For the network-configuration parameters, I suggest to keep the name directed for configuring whether to construct directed or undirected networks. This parameter will behave as hitherto. Moreover, I suggest to name the second parameter respect.temporal.order, as this might be more comprehensive and less ambiguous as this parameter does not have anything to do with the directedness of a network. The default value of respect.temporal.order will be TRUE in the case of directed=TRUE and FALSE if directed=FALSE, so that we keep the former behavior.

I will work on the implementation probably tomorrow. Stay tuned!

clhunsen commented 6 years ago

Fixed with PR #137.