harvardnlp / botnet-detection

Topological botnet detection datasets and graph neural network applications
MIT License
172 stars 42 forks source link

Graphs have parallel edges #11

Open iamgroot42 opened 3 years ago

iamgroot42 commented 3 years ago

It seems the data pre-processing does not convert the graphs to simple graphs. Even though they are undirected and unweighted, some nodes in the graphs have multiple parallel edges. This bug seems to impact graphs loaded via torch_geometric and dgl - networkx already handles this.

Thankfully, the number of such parallel edges is not significant - I did a quick check on the train and validation set, and the graphs have ~ 11 extra parallel edges on average. Not a big issue (it should not impact model performance or results in any way), though.

jzhou316 commented 3 years ago

Hi @iamgroot42 thanks for pointing this out! We saw your pull request and will merge it. Otherwise, torch_geometric also has a data method to check or remove multi-edges, and multi-edges could also be useful in other contexts (e.g. when there are different edge labels, or when the number of multi-edges represents the frequency or importance of that edge when we superpose edges at different times into a single graph).