Improved UserBase with measurements

odunbar commented 4 years ago

I have added 2 features in this PR 1) I have added a neighbourhood based method for constructing the Island (rather than these cliques) it is now recommended (and the default for ContiguousUserBase) 2) I have added some indicators to get an idea of the 'island' structure of the user base:

number of interior nodes (all neighbours within the subgraph)
number of boundary nodes (with at least one exterior neighbour to the subgraph)
the average number of exterior neighbours, that a boundary node has.

Below is the output from examples/specify_user_base.py with graph nx.watts_strogatz_graph(100000, 12, 0.1, 1)

User base: Full number of nodes 100000 number of edges 600000

User base: 0.1 fraction of nodes, randomly chosen number of nodes 10000 number of edges 6143 number of interior nodes: 0 number of boundary nodes 10000 average exterior neighbours of boundary node 10.7911

User base: 0.1 fraction of nodes, chosen using neighbor method number of nodes 10000 number of edges 36488 number of interior nodes: 2322 number of boundary nodes 7678 average exterior neighbours of boundary node 6.278197447251888

User base: 0.1 fraction of nodes, chosen using clique method number of nodes 10000 number of edges 30060 number of interior nodes: 717 number of boundary nodes 9283 average exterior neighbours of boundary node 6.629322417321986

odunbar commented 4 years ago

clique_network neighbor_network random_network

Added some rudimentary (plotting, very slow...) but it's nice to see what the islands look like:

tapios commented 4 years ago

These plots are nice. Could you please export them as pdf and add to the Overleaf (including a plot of the full network alone)? It'll be good to include in the paper.

lubo93 commented 4 years ago

If we want to plot the whole network, I would recommend graph_tools or Gephi. Python/networkx/matplotlib are not the best graph plotting tools.

In graph_tools one can use:

from graph_tool.draw import graph_draw, fruchterman_reingold_layout, sfdp_layout, draw_hierarchy

and

graph_draw(largest_comp, output="graph.png").

For the whole network, Gephi should do the job: https://gephi.org/

odunbar commented 4 years ago

@tapios So the plotting is incredibly slow, these plots are for another graph (not the NYC data) it has 1500 points and i just used it as an illustration.

We do have a 1000 node version of the NYC data but it doesn't seem to be very clear (it's hard to see whats going on as there is a very dense pack of nodes). I could try the 10k one, but the plot tools (as lucas notes are very slow) for more that 2000 nodes (i.e likely on the order of hours). In which case I may try Lucas's suggestion for proper tools for the larger graph - these may also give a clearer representation of it.

lubo93 commented 4 years ago

@tapios So the plotting is incredibly slow, these plots are for another graph (not the NYC data) it has 1500 points and i just used it as an illustration.

We do have a 1000 node version of the NYC data but it doesn't seem to be very clear (it's hard to see whats going on as there is a very dense pack of nodes). I could try the 10k one, but the plot tools (as lucas notes are very slow) for more that 2000 nodes (i.e likely on the order of hours). In which case I may try Lucas's suggestion for proper tools for the larger graph - these may also give a clearer representation of it.

Yes, I agree. We can use Gephi if we need some better visualizations (Networks up to 100,000 nodes and 1,000,000 edges).

tapios commented 4 years ago

Ok. These figures are not crucial. If we end up including an island simulation in the paper, it might be nice to have an illustration along the lines of what you had. It can be from a small network if that is clearer.

odunbar commented 4 years ago

I ran the neighbour/clique/random networks at 5% for the nodes of the NYC 1e4 data. This is the results (in pdf) do you think they are usable? you can see that they aren't as clear as the toy example graph!It does however show that the random nodes graph has less interconnections (1:1 nodes:edges and 0 interior nodes) than the graph constructed based on neighbourhoods (1:2 nodes:edges, and ~10% interior nodes). The percentage interior nodes increases with larger user population size neighbor_network.pdf clique_network.pdf ...this pdf seems to have less edges than the png version... random_network.pdf

These plots only take ~10-20mins

odunbar commented 4 years ago

I will merge the PR and we can create images with a plot_user_base.py code - NB I also tried using gephi but it wasn't playing very well with my system. So it is possible we can get better representations later.

tapios commented 4 years ago

I ran the neighbour/clique/random networks at 5% for the nodes of the NYC 1e4 data. This is the results (in pdf) do you think they are usable? you can see that they aren't as clear as the toy example graph!It does however show that the random nodes graph has less interconnections (1:1 nodes:edges and 0 interior nodes) than the graph constructed based on neighbourhoods (1:2 nodes:edges, and ~10% interior nodes). The percentage interior nodes increases with larger user population size neighbor_network.pdf clique_network.pdf ...this pdf seems to have less edges than the png version... random_network.pdf

These plots only take ~10-20mins

They look ok. But maybe the toy example is preferable.

tapios / risk-networks

Improved UserBase with measurements #76