Closeness centrality running without interruption

sopeKhadim commented 4 years ago

Hi, I am using Teneto to calculate the closeness centrality. My code is running more than two days without stopping. what is the problem ? Resources

Processor(s): 32
Threads par core: 2
Core per socket : 8
Model : Intel(R) Xeon(R) CPU E5-2650 v2 @ 2.60GHz
Memory : 130 giga

Code:

import pandas as pd
import teneto as tn
import numpy as np
from teneto import TemporalNetwork

filename = 'data/Sociopatterns/primaryschool.csv'
df_PS = pd.read_csv(filename, names = ['t', 'i', 'j', 'Ci', 'Cj'], header=None, delimiter='\t')
df_PS_Day = df_PS[ df_PS['t'] < 63000] 
df_PS_final = df_PS_Day[[ 't', 'i', 'j']]

df_PS_final

	t	i	j
0	31220	1558	1567
1	31220	1560	1570
2	31220	1567	1574
3	31220	1632	1818
4	31220	1632	1866
...	...	...	...
60618	62300	1719	1859
60619	62300	1720	1843
60620	62300	1739	1872
60621	62300	1767	1787
60622	62300	1800	1820

60623 rows × 3 columns

# The time resolution between two signal is 20 seconds.
# This function normalize it. 
def contact_graph(data, cfg=None, resolution=0): 

    nodes = pd.concat([data['i'], data['j']]) 
    nodes = nodes.sort_values()
    nodes = nodes.unique()
    nb_nodes = len(nodes)

    ## changes the label of each node to be the index of np.array
    keys = nodes
    values = list(range(nb_nodes))
    nodeslabels = dict(zip(keys, values))

    if resolution == 0 :
        times = data['t']
        times = times.sort_values()
        times = times.unique()
        n_times = len(times)
    else:
        times = range(min(data['t']), max(data['t'])+resolution, resolution)
        n_times = len(times)

    ## changes the time counter 
    keys_times = times
    items_times = list(range(n_times))
    timesCounter  = dict(zip(keys_times,items_times ))          

    df = data.copy()
    frame =pd.DataFrame()
    frame['i'] = df['i'].map(nodeslabels)
    frame['j'] = df['j'].map(nodeslabels)
    frame['t'] = df['t'].map(timesCounter)

    contacts = [tuple([frame.iloc[i,0], frame.iloc[i,1], frame.iloc[i,2]]) for i in range(len(frame))]
    contacts = np.array(contacts)
    contacts = contacts[contacts[:, 2].argsort()]

    return contacts

contacts_PS = contact_graph(df_PS_final, resolution=20)
contacts_PS[:50]

array([[ 58,  63,   0],
       [ 59,  64,   0],
       [ 63,  66,   0],
       [ 85, 185,   0],
       [ 85, 209,   0],
       [101, 114,   0],
       [186, 194,   0],
       [186, 209,   0],
       [186, 194,   1],
       [183, 189,   1],
       [141, 187,   1],
       [101, 114,   1],
       [ 85, 185,   1],
       [ 63,  66,   1],
       [ 58,  63,   1],
       [ 85, 209,   1],
       [101, 114,   2],
       [179, 191,   2],
       [179, 181,   2],
       [159, 168,   2],
       [141, 187,   2],
       [ 85, 209,   2],
       [ 58,  62,   2],
       [ 62,  66,   2],
       [ 62,  63,   2],
       [ 59,  64,   2],
       [ 58,  63,   2],
       [ 63,  66,   2],
       [ 85, 186,   3],
       [179, 181,   3],
       [150, 152,   3],
       [101, 114,   3],
       [ 85, 185,   3],
       [ 85, 209,   3],
       [ 62,  66,   3],
       [ 62,  63,   3],
       [ 58,  63,   3],
       [ 58,  62,   3],
       [ 36,  51,   3],
       [ 63,  66,   3],
       [183, 189,   4],
       [150, 153,   4],
       [101, 114,   4],
       [ 85, 209,   4],
       [ 63,  66,   4],
       [ 58,  63,   4],
       [ 17,  39,   4],
       [ 80,  87,   4],
       [162, 164,   5],
       [158, 169,   5]])

tnetHS = TemporalNetwork(from_edgelist=list(contacts_PS))

tnetHS.network.head()

	i	j
0	58	63
1	59	64
2	63	66
3	85	185
4	85	209

 closeness = tn.networkmeasures.shortest_temporal_path(tnetHS)

wiheto commented 4 years ago

Hi,

Unfortunately the current code in shortest_temporal_paths is currently quite slow in its current implementation, especially for larger networks (exponentially grows with size). I've only really used in on networks about 200 nodes in size and 400 time points. It is something on the todo list to increase the speed.

wiheto commented 4 years ago

So the code should finish eventually, but I can't say how long it will take.

sopeKhadim commented 4 years ago

Thank you for your response, Can we hope soon that you will improve the performance of the shortest path code for large temporal networks? I think other measures depend on it, like betweeness.

wiheto commented 4 years ago

Can we hope soon that you will improve the performance of the shortest path code for large temporal networks?

I'll boost it up the priority queue cause it is something I've wanted to fix for sometime.

zhao-snoe commented 3 years ago

Have you solved your problem?At present, I also met the problem of using the same as you in the calculation

zhao-snoe commented 3 years ago

In your article, I saw that you used more than 200 nodes and more than 400 time points. How long did it take you to calculate the closness centrality? I think this may have time-reference significance for my current calculations. Because I calculate 60 nodes, 51 networks, and 10% sparsity. My code is running more than four days without stopping.

wiheto commented 3 years ago

Development on the toolbox stalled this year because of other reasons.

The original article used an older version of this function which was quick, but could not handle larger networks. The speed issue was introduced to the shortest temporal paths function when modifying the rest of the toolbox to being able to handle larger networks.

You can use an older version of the code which should be quicker:

I think the quicker version can be found here: https://github.com/wiheto/teneto/releases/tag/0.3.5

Until then, despite being a high priority when developing this toolbox, this is currently only something which is done in my spare time due to other obligations. I cannot give a time estimate when I can push to complete the the next version where the shortest path estimation will be substantially quicker.

zhao-snoe commented 3 years ago

Thank you for your prompt reply. I will try with the code you recommended.

------------------ 原始邮件 ------------------ 发件人: "wiheto/teneto" <notifications@github.com>; 发送时间: 2020年12月24日(星期四) 凌晨1:16 收件人: "wiheto/teneto"<teneto@noreply.github.com>; 抄送: "咹靜"<508609839@qq.com>;"Comment"<comment@noreply.github.com>; 主题: Re: [wiheto/teneto] Closeness centrality running without interruption (#64)

Development on the toolbox stalled this year because of other reasons.

The original article used an older version of this function which was quick, but could not handle larger networks. The speed issue was introduced to the shortest temporal paths function when modifying the rest of the toolbox to being able to handle larger networks.

You can use an older version of the code which should be quicker:

I think the quicker version can be found here: https://github.com/wiheto/teneto/releases/tag/0.3.5

Until then, despite being a high priority when developing this toolbox, this is currently only something which is done in my spare time due to other obligations. I cannot give a time estimate when I can push to complete the the next version where the shortest path estimation will be substantially quicker.

— You are receiving this because you commented. Reply to this email directly, view it on GitHub, or unsubscribe.

wiheto commented 3 years ago

So now I have time to solve this issue. I'm gathering all the issues regarding speed into one issue. Follow #74 for more info about this.

wiheto / teneto

Closeness centrality running without interruption #64