isjakewong / MIRACLE

Multi-view Graph Contrastive Representation Learning for Drug-drug Interaction Prediction
https://arxiv.org/abs/2010.11711
41 stars 8 forks source link

Confusing issue about ablation study on MIRACLE #2

Closed ZillaRU closed 2 years ago

ZillaRU commented 2 years ago

I have constructed a new DDI dataset of larger scale than the ones you provided. I got confused that using the GCN(for learning from observed DDIs) from MIRACLE only gained better performance than using the entire MIRACLE. With no outputs from BAMPN (for learning from molecular graphs) or other node information, I just initialized node features for the first GCN layer by standard normal distribution. I have tried for times, but with no exception, the performance GCN-only > BAMPN+GCN > BAMPN-only. How to tune the hyperparameters to make the intra- and inter- view fused better? Any help? Thanks in advance.

hzcheney commented 2 years ago

I have the same doubt. I plan to use this method as a baseline of my work, but it seems to have poor performance, the AUC could be nearly 0.6 but never better.

ZillaRU commented 2 years ago

I tested MIRACLE on a dataset collected by myself. It did reach a high AUROC. But with the part of the whole model, it showed even better performance. Amazing...🙃

hzcheney commented 2 years ago

I was curious about the result of your newly collected dataset, could you describe it a bit more specifically? I was also running the MIRACLE method as a baseline of my current work, but it fails to obtain high performance at three open datasets. Last, could you please describe the process of how you prepare the dataset for MIRACLE? I want to know if I did not process the dataset correctly.Thank you very much!Hope you have a good day!:)

isjakewong commented 2 years ago

@ZillaRU The performance mainly depends on the property of the dataset. In general, the DDI network is highly heterophilous such that drug molecular information will undoubtedly improve the representations from GNNs. However, if the DDI network of your dataset has strong homophily, the performance can indeed be well even w/o molecular information. Check that whether your dataset is homophilous or not! There is a method here for you to have an assessment: Adaptive Universal Generalized PageRank Graph Neural Network, [Eli Chien, et al., 2021]. Another concern is about your initialization method. Have you tried other initializations on node attributes, e.g., random initialization/one-hot encodings? Different initializations will also influence training, converging, and predicting.

isjakewong commented 2 years ago

@hzcheney Yes. I do believe it relates to how you process your dataset. I am more than willing to check it. Email me!

ZillaRU commented 2 years ago

Thank you for your kind reply! I will check and compare the homophily of my datasets and the datasets in your paper. Hope this comparison can answer my doubts.

isjakewong commented 2 years ago

No problem! We can have a further discussion in detail via email when you have new results! : )

ZillaRU commented 2 years ago

Have you checked the homophily of datasets in your study? For my dataset, it 's hard to find proper labels/classes for drug nodes and then measure the level of node homophily like what Eli Chien, et al. did. Are proper labels available for datasets used in your study?😂

isjakewong commented 2 years ago

I literally didn't do that. A possible way to label your drugs is according to their scaffolds (like what we do in scaffold splitting).