yueliu1999 / Dink-Net

[ICML 2023] An official source code for paper "Dink-Net: Neural Clustering on Large Graphs".
MIT License
273 stars 34 forks source link

end-to-end issue #4

Open Yide-Qiu opened 8 months ago

Yide-Qiu commented 8 months ago

Hello. In your paper you mentioned that this work "was unified into an end-to-end framework". However, in your published code: 1) you directly use ogb-supplied features instead of text attributes; 2) your work includes an inevitable pre-training process. Do you consider this work as "end-to-end" and why? Looking forward to your reply.

yueliu1999 commented 8 months ago

Hi, thanks for your attention.

Most deep graph clustering methods, and even most graph representation learning methods, all directly user ogbn-supplied features instead of raw text. We consider it to be a default setting. Recently, benefitting from the strong general knowledge understanding capability of LLMs, a few methods [1, 2] deal with the raw text via LLMs. It is a potential way.

The purpose of the pre-training process is to obtain the initialized cluster center embeddings. It is a widely used technique of graph learning, CV, and NLP. The related competitor S3GC [3] first performs graph representation learning and then directly performs k-means on the learned node embeddings. We consider this process to separate the representation learning and clustering optimization. Therefore, we first pre-train the encoders, and then, at the fine-tuning stage, we unified the representation learning and clustering optimization into an end-to-end framework.

Without pre-training, namely, training the whole network from scratch, it is hard to achieve promising performance, especially in the purely unsupervised clustering task. There are some methods [4, 5] that are free from pre-training, but they are similar to S3GC. They first perform representation learning and then perform k-means. If you have any questions or suggestions, feel free to contact me on WeChat: ly13081857311. Any issues and pull requests are also welcomed.

[1] He X, Bresson X, Laurent T, et al. Explanations as Features: LLM-Based Features for Text-Attributed Graphs[J]. arXiv preprint arXiv:2305.19523, 2023. [2] Zhao J, Zhuo L, Shen Y, et al. Graphtext: Graph reasoning in text space[J]. arXiv preprint arXiv:2310.01089, 2023. [3] Devvrit F, Sinha A, Dhillon I, et al. S3GC: Scalable self-supervised graph clustering[J]. Advances in Neural Information Processing Systems, 2022, 35: 3248-3261. [4] Liu Y, Yang X, Zhou S, et al. Simple contrastive graph clustering[J]. IEEE Transactions on Neural Networks and Learning Systems, 2023. [5] Liu Y, Yang X, Zhou S, et al. Hard sample aware network for contrastive deep graph clustering[C]//Proceedings of the AAAI conference on artificial intelligence. 2023, 37(7): 8914-8922.

Yide-Qiu commented 8 months ago

Thanks for your detailed reply, pre-training to get clustering embeddings is indeed a good idea. :)