harvardnlp / botnet-detection

Topological botnet detection datasets and graph neural network applications
MIT License
170 stars 42 forks source link

Undirected or directed? And how creat a graph? #23

Open TruongDuyLongPTIT opened 2 years ago

TruongDuyLongPTIT commented 2 years ago

Hi, i need your help. 1) When i read your paper. I saw you said: "All the graphs are undirected and preprocessed to have self-loops to speed up training". Besides, you also can said: "we propose to use a random walk style normalization ̄A=D−1A which only involves the degree of the source nodes to equate the normalized adjacency matrix to the corresponding probability transition matrix". In here, you use "degree of source node" terms, i think this terms equivalent with "out-degree" terms. But "out-degree" terms only use for directed graph. So, it make me confuse, i can't understand your graph is undirected or directed. 2) Why self-loop can speed up trainning? and What mean "normalized adjacency matrix to the corresponding probability transition matrix"? 3) I see your code in botgen folder. It seem create a botnet by pick random some node. So, I want to ask, by randomly selecting bot nodes, is it possible to create a botnet with the same topology as in reality and why? image Thank for your help!

jzhou316 commented 1 year ago

Hi there! Here are some clarifications

  1. "degree of source node": this is for normalizing the "message" on the edges. Each edge e = (A, B) has two end nodes, one as "source node" A and one as "destination node" B. These two nodes are usually different (except self-loops when A = B). So here we just mean to use the degree of the source node A, instead of that of B. This applies to both undirected and directed graphs (in undirected graphs, it is just the degree; in directed graphs, this could be in-degree or out-degree, in which we use would out-degree).
  2. self-loops ensures that the messages (hidden vectors from neighbors) received to a node B at some layer l, can be maintained at the next layer l + 1 when a new round of message aggregation happens. This is because the current information at node B will be passed to itself through self-loop. It is a convenient way of keeping the information by adding the self-loops in data without changing the model logic.
  3. The botnet follows certain topologies in their connections. This is essentially to generate a botnet with the expected topology, and they overlay it onto a larger graph, by randomly matching the nodes. The subnetwork with the botnet topology will not change, just their locations could be different in the full network.

Hope this helps!

TruongDuyLongPTIT commented 1 year ago

Thanks. You havel me a lot.