Using DGCNN for planted clique classification - Githubissues

muhanzhang / DGCNN

Code for "M. Zhang, Z. Cui, M. Neumann, and Y. Chen, An End-to-End Deep Learning Architecture for Graph Classification, AAAI-18".

MIT License

174 stars 44 forks source link

Using DGCNN for planted clique classification #1

Closed Totoketchup closed 5 years ago

Totoketchup commented 6 years ago

I'm trying to run DGCNN on my dataset consisting in graphs of 100 nodes in which a k-clique (k=15) has been added or not. Unfortunately, the accuracy is staying at 50% while state of the art can perform 100%. I tried with the following architecture: th main.lua -dataName PLANTED -learningRate 1e-4 -maxEpoch 300 -nodeLabel nDegree -k 30 -outputChannels '32 32 32 1' What do you think the problem can be here? Is it because of the high density of the adjacency matrix of such graphs ?

muhanzhang commented 6 years ago

I am not sure about the reason. Have you tried to set -k 100 ? Since your graphs have the same size, it is reasonable to set k to their size so that all the node features are preserved after SortPooling. Let me know if it has solved your issue.

Totoketchup commented 6 years ago

Thank you very much for your advice. I just tried -k 100 but the accuracy is still oscillating around 50%. I can send you the dataset if you want to reproduce it.

muhanzhang commented 6 years ago

Ok. Please send it to me. I will try it some time later. Thanks!

Totoketchup commented 6 years ago

I sent it to you via your mail.

muhanzhang commented 6 years ago

I found the problem description in https://en.wikipedia.org/wiki/Planted_clique. It's an interesting problem. I have tried the dataset using various configs, and I am getting 50% accuracy too. The training is stuck from the beginning. Have you tried other graph classification methods, such as graph kernels, other than the state-of-the-art methods (based on graph theory)? Can they achieve better performance? I am curious about whether this problem can be solved using a general graph classification algorithm, instead of using an algorithm specifically designed for it.

Totoketchup commented 6 years ago

Thank you for your answer and for having tried the dataset. With the Shortest Path Kernel, in this paper (http://proceedings.mlr.press/v32/johansson14.pdf), they reached around 60% of accuracy with graphs of 200 nodes, a 14-clique and 100 samples. Today, we can achieve about 62% of accuracy with a graph of 1000 nodes, with 30-clique, and 1000 samples. Furthermore, Walk2Vec-SC algorithm (https://arxiv.org/pdf/1704.05516.pdf) classified graphs with 1000 nodes, 31-clique, with an AUC of 0.71. It's quite interesting to see that DGCNN is struggling to classify such planted-graphs. I tried with 100 nodes and various size of clique, and DGCNN is taking off 50% from a clique size of 91 (that is quite big) and has an accuracy of 59%. Then for (K / accuracy) we get: (92 / 70%), (93 / 75.5%), (94 / 91.5%), (95/ 97%) and (96 / 100%).

muhanzhang commented 6 years ago

Thank you for letting me know the performance of DGCNN on this task. It would be very interesting to analyze why DGCNN doesn't work well here. It's also interesting to see if other propagation-based methods (WL kernel, PK kernel, and other graph neural networks) struggle here. If so, then propagation maybe the reason.

Totoketchup commented 6 years ago

Sorry for this late reply. If I remember well, today's best method is using random walks with Walk2Vec. I tried to run Graph2Vec on this problem as well but it stucks at 50% like DGCNN. This might be interesting to see what are current neural networks architecture for graphs 'missing' for the planted-clique problem.