se-sic / coronet

coronet – the R library for configurable and reproducible construction of developer networks
GNU General Public License v2.0
7 stars 15 forks source link

Test PaSta + Synchronicity, equality of different network constructions #226

Closed joba00002 closed 2 years ago

joba00002 commented 2 years ago

Prerequisites

Description

Addresses #86

Changelog

joba00002 commented 2 years ago

As it turns out, despite three existing tests asserting just that, split.networks.time.based and splitting a multinetwork and extracting the author and bipartite network out of that do not generally result in the same networks.

This can be seen in the sample data by choosing author.relation = 'cochange' and artifact.relation = 'mail'.

The difference is in how isolated nodes are treated. I suspect that the existing tests only worked because with our sample data, there were no isolated nodes.

joba00002 commented 2 years ago

It might still make sense to assert that networks are equal when setting remove.isolates = FALSE. Unfortunately, this currently does not work as extract.bipartite.network has no such option.

joba00002 commented 2 years ago

It might still make sense to assert that networks are equal when setting remove.isolates = FALSE. Unfortunately, this currently does not work as extract.bipartite.network has no such option.

I have added this option and changed the tests to use it.

joba00002 commented 2 years ago

The existing tests asserted something that may still not be generally true: For the results after splitting to be equal, all networks included in split.networks.time.based must together cover the same time span as the multinetwork. Otherwise, the networks will not use the same bins for splitting.

On the sample data, this was the case for author and bipartite networks, but it is not the case for author and artifact or bipartite and artifact networks.

However, using all three networks (author, artifact and bipartite) should ensure that the networks are split into the same bins.

bockthom commented 2 years ago

As it turns out, despite three existing tests asserting just that, split.networks.time.based and splitting a multinetwork and extracting the author and bipartite network out of that do not generally result in the same networks.

This can be seen in the sample data by choosing author.relation = 'cochange' and artifact.relation = 'mail'.

The difference is in how isolated nodes are treated. I suspect that the existing tests only worked because with our sample data, there were no isolated nodes.

First of all, sorry for the late reply. Our mail servers have not been available and are (still!) not available. So, I might not directly notice when there are new postings.

Thanks for spotting all these inconsistencies and bugs, also thanks for adding a remove.isolates parameter for bipartite networks, sounds good!

I will probably have a look at your changes today or tomorrow, but can you please briefly explain what exactly the difference is in how "isolated nodes are treated"? Knowing that would be helpful for understanding and reviewing your changes.

joba00002 commented 2 years ago

The difference is whether, or rather when, isolated nodes are removed.

Removing isolated nodes first and then splitting may lead to a different result than splitting first and then removing isolated nodes.