The extraction of the subnetwork in the Github dataset

Hi, Thanks for providing this implementation of dynamic network representation. I am currently researching dynamic network embedding, and I am curious about the extraction of the subnetwork with 284 users in the Github dataset in your code. I have downloaded the whole dataset in 2013 and preprocessed it in the form of [src_id, dst_id, snapshot_id] by choosing the event type as "FollowEvent", where the time interval between adjacent snapshots is 7 days. After removing nodes with only one edge connected to the network), there are still more than 900k nodes. When I try to generate adjacency matrices in different snapshots, I find that most nodes have degrees lower than 20. However, if I only choose the nodes with degrees higher than some value (like 20), the network constructed by the selected nodes would be totally different (because the high degree of some node could be caused by those low-degree nodes, which have been removed; the selected high-degree nodes would be low-degree after the selection).

A possible way to do this I have thought of is to perform community clustering along with snapshots. But I currently have no idea how to make an efficient implementation, considering the large node numbers.

So, I wonder how do you select the 284 nodes to guarantee the nodes have dense events with each other. Could you provide your selection code? Hope your reply

uoguelph-mlrg / LDG

The extraction of the subnetwork in the Github dataset #5