wiheto / teneto

Temporal Network Tools
GNU General Public License v3.0
85 stars 26 forks source link

Closeness centrality running without interruption #64

Closed sopeKhadim closed 3 years ago

sopeKhadim commented 4 years ago

Hi, I am using Teneto to calculate the closeness centrality. My code is running more than two days without stopping. what is the problem ? Resources

Code:

import pandas as pd
import teneto as tn
import numpy as np
from teneto import TemporalNetwork

filename = 'data/Sociopatterns/primaryschool.csv'
df_PS = pd.read_csv(filename, names = ['t', 'i', 'j', 'Ci', 'Cj'], header=None, delimiter='\t')
df_PS_Day = df_PS[ df_PS['t'] < 63000] 
df_PS_final = df_PS_Day[[ 't', 'i', 'j']]
df_PS_final
t i j
0 31220 1558 1567
1 31220 1560 1570
2 31220 1567 1574
3 31220 1632 1818
4 31220 1632 1866
... ... ... ...
60618 62300 1719 1859
60619 62300 1720 1843
60620 62300 1739 1872
60621 62300 1767 1787
60622 62300 1800 1820

60623 rows × 3 columns

# The time resolution between two signal is 20 seconds.
# This function normalize it. 
def contact_graph(data, cfg=None, resolution=0): 

    nodes = pd.concat([data['i'], data['j']]) 
    nodes = nodes.sort_values()
    nodes = nodes.unique()
    nb_nodes = len(nodes)

    ## changes the label of each node to be the index of np.array
    keys = nodes
    values = list(range(nb_nodes))
    nodeslabels = dict(zip(keys, values))

    if resolution == 0 :
        times = data['t']
        times = times.sort_values()
        times = times.unique()
        n_times = len(times)
    else:
        times = range(min(data['t']), max(data['t'])+resolution, resolution)
        n_times = len(times)

    ## changes the time counter 
    keys_times = times
    items_times = list(range(n_times))
    timesCounter  = dict(zip(keys_times,items_times ))          

    df = data.copy()
    frame =pd.DataFrame()
    frame['i'] = df['i'].map(nodeslabels)
    frame['j'] = df['j'].map(nodeslabels)
    frame['t'] = df['t'].map(timesCounter)

    contacts = [tuple([frame.iloc[i,0], frame.iloc[i,1], frame.iloc[i,2]]) for i in range(len(frame))]
    contacts = np.array(contacts)
    contacts = contacts[contacts[:, 2].argsort()]

    return contacts
contacts_PS = contact_graph(df_PS_final, resolution=20)
contacts_PS[:50]
array([[ 58,  63,   0],
       [ 59,  64,   0],
       [ 63,  66,   0],
       [ 85, 185,   0],
       [ 85, 209,   0],
       [101, 114,   0],
       [186, 194,   0],
       [186, 209,   0],
       [186, 194,   1],
       [183, 189,   1],
       [141, 187,   1],
       [101, 114,   1],
       [ 85, 185,   1],
       [ 63,  66,   1],
       [ 58,  63,   1],
       [ 85, 209,   1],
       [101, 114,   2],
       [179, 191,   2],
       [179, 181,   2],
       [159, 168,   2],
       [141, 187,   2],
       [ 85, 209,   2],
       [ 58,  62,   2],
       [ 62,  66,   2],
       [ 62,  63,   2],
       [ 59,  64,   2],
       [ 58,  63,   2],
       [ 63,  66,   2],
       [ 85, 186,   3],
       [179, 181,   3],
       [150, 152,   3],
       [101, 114,   3],
       [ 85, 185,   3],
       [ 85, 209,   3],
       [ 62,  66,   3],
       [ 62,  63,   3],
       [ 58,  63,   3],
       [ 58,  62,   3],
       [ 36,  51,   3],
       [ 63,  66,   3],
       [183, 189,   4],
       [150, 153,   4],
       [101, 114,   4],
       [ 85, 209,   4],
       [ 63,  66,   4],
       [ 58,  63,   4],
       [ 17,  39,   4],
       [ 80,  87,   4],
       [162, 164,   5],
       [158, 169,   5]])
tnetHS = TemporalNetwork(from_edgelist=list(contacts_PS))
tnetHS.network.head()
i j t
0 58 63 0
1 59 64 0
2 63 66 0
3 85 185 0
4 85 209 0
 closeness = tn.networkmeasures.shortest_temporal_path(tnetHS)
wiheto commented 4 years ago

Hi,

Unfortunately the current code in shortest_temporal_paths is currently quite slow in its current implementation, especially for larger networks (exponentially grows with size). I've only really used in on networks about 200 nodes in size and 400 time points. It is something on the todo list to increase the speed.

wiheto commented 4 years ago

So the code should finish eventually, but I can't say how long it will take.

sopeKhadim commented 4 years ago

Thank you for your response, Can we hope soon that you will improve the performance of the shortest path code for large temporal networks? I think other measures depend on it, like betweeness.

wiheto commented 4 years ago

Can we hope soon that you will improve the performance of the shortest path code for large temporal networks?

I'll boost it up the priority queue cause it is something I've wanted to fix for sometime.

zhao-snoe commented 3 years ago

Have you solved your problem?At present, I also met the problem of using the same as you in the calculation

zhao-snoe commented 3 years ago

In your article, I saw that you used more than 200 nodes and more than 400 time points. How long did it take you to calculate the closness centrality? I think this may have time-reference significance for my current calculations. Because I calculate 60 nodes, 51 networks, and 10% sparsity. My code is running more than four days without stopping.

wiheto commented 3 years ago

Development on the toolbox stalled this year because of other reasons.

The original article used an older version of this function which was quick, but could not handle larger networks. The speed issue was introduced to the shortest temporal paths function when modifying the rest of the toolbox to being able to handle larger networks.

You can use an older version of the code which should be quicker:

I think the quicker version can be found here: https://github.com/wiheto/teneto/releases/tag/0.3.5

Until then, despite being a high priority when developing this toolbox, this is currently only something which is done in my spare time due to other obligations. I cannot give a time estimate when I can push to complete the the next version where the shortest path estimation will be substantially quicker.

zhao-snoe commented 3 years ago

Thank you for your prompt reply. I will try with the code you recommended.

------------------ 原始邮件 ------------------ 发件人: "wiheto/teneto" <notifications@github.com>; 发送时间: 2020年12月24日(星期四) 凌晨1:16 收件人: "wiheto/teneto"<teneto@noreply.github.com>; 抄送: "咹靜"<508609839@qq.com>;"Comment"<comment@noreply.github.com>; 主题: Re: [wiheto/teneto] Closeness centrality running without interruption (#64)

Development on the toolbox stalled this year because of other reasons.

The original article used an older version of this function which was quick, but could not handle larger networks. The speed issue was introduced to the shortest temporal paths function when modifying the rest of the toolbox to being able to handle larger networks.

You can use an older version of the code which should be quicker:

I think the quicker version can be found here: https://github.com/wiheto/teneto/releases/tag/0.3.5

Until then, despite being a high priority when developing this toolbox, this is currently only something which is done in my spare time due to other obligations. I cannot give a time estimate when I can push to complete the the next version where the shortest path estimation will be substantially quicker.

— You are receiving this because you commented. Reply to this email directly, view it on GitHub, or unsubscribe.

wiheto commented 3 years ago

So now I have time to solve this issue. I'm gathering all the issues regarding speed into one issue. Follow #74 for more info about this.