pygod-team / pygod

A Python Library for Graph Outlier Detection (Anomaly Detection)
https://pygod.org
BSD 2-Clause "Simplified" License
1.31k stars 127 forks source link

BOND data possible inconsistency #70

Closed realfolkcode closed 1 year ago

realfolkcode commented 1 year ago

Describe the bug In the BOND paper, it is said that all the datasets are undirected, except Weibo.

Note that Weibo is a directed graph; the remaining datasets used in our benchmark are undirected graphs.

However, load_data function returns directed PyG graphs (only "reddit" is undirected for some reason). Here is the output of is_undirected method

inj_cora False
inj_amazon False
inj_flickr False
weibo False
reddit True
disney False
books False
enron False```

To Reproduce Here is a colab notebook to reproduce the output above https://colab.research.google.com/drive/1mNXh66Ac2hUduHvzKtGifC7_huBgCf-5?usp=sharing

Expected behavior I expected the data to be consistent with what is stated in the paper. Please let me know if I misunderstood something or it's indeed a mistake. Thanks!

yzfxmu commented 1 year ago

Describe the bug In the BOND paper, it is said that all the datasets are undirected, except Weibo.

Note that Weibo is a directed graph; the remaining datasets used in our benchmark are undirected graphs.

However, load_data function returns directed PyG graphs (only "reddit" is undirected for some reason). Here is the output of is_undirected method

inj_cora False
inj_amazon False
inj_flickr False
weibo False
reddit True
disney False
books False
enron False```

To Reproduce Here is a colab notebook to reproduce the output above https://colab.research.google.com/drive/1mNXh66Ac2hUduHvzKtGifC7_huBgCf-5?usp=sharing

Expected behavior I expected the data to be consistent with what is stated in the paper. Please let me know if I misunderstood something or it's indeed a mistake. Thanks!

PYG treat all graph as directed. In this case, the datasets mentioned above have edges from two directions.

realfolkcode commented 1 year ago

PYG treat all graph as directed. In this case, the datasets mentioned above have edges from two directions.

@yzfxmu it is true that PyG's implementation of graphs is directed. As you correctly pointed out, for undirected graphs, their edges are considered bidirected. Therefore, each edge should appear twice in edge_index (assuming there are no self-loops). However, this is not the case for most of the graphs in BOND, as can be checked by is_undirected method. The possible consequence of this is suboptimal performance of message passing layers.

kayzliu commented 1 year ago

Thanks for pointing out the problem. After carefully reviewing the datasets, we summarize the results as follows:

In summary, apologize for the confusion made by our mistakes, but we are not able to change the datasets as we have reported the results based on these datasets in the paper. We provide potential solutions for each dataset.