storyandwine / LAGCN

Code and Datasets for "Predicting Drug-Disease Associations through Layer Attention Graph Convolutional Networks"
51 stars 15 forks source link

raw data #4

Closed shahinghasemi closed 2 years ago

shahinghasemi commented 3 years ago

Hello, thanks for your great work, is that possible please send me all the raw data (hasn't been processed by your method) to me? (my email: shahinghasemi.me@gmail.com) another question: is that possible to add some further diseases information e.g. side effects, targets for drugs etc. to your framework?

thanks in advance.

storyandwine commented 3 years ago

Hi Shahin Ghasemi, Thanks for your attention! Source data from paper "Predicting drug-disease associations by using similarity constrained matrix factorization" that has not been processed has been packaged and sent to your mailbox. As for further diseases information, there are two simple ways of processing it. One is to treat them as features in GCN, another one is to calculate similarity and then add them to the heterogeneous graph. Best Wishes!

storyandwine commented 3 years ago

raw_data.zip

shahinghasemi commented 3 years ago

Thanks for your quick reply, Isn't there any drug related features? Target Enzyme Drug–drug interactions Pathway Substructure these are the ones that's been mentioned on your paper. plus isn't there any disease-diseases similarity matrix using calculated this method since you've cited cbPred paper which itself uses disease-disease similarity matrix using aforementioned method.

storyandwine commented 3 years ago

Hi, Drug-related features can be download from https://www.drugbank.com/ by IDs in raw-data. Our implementation of disease-disease similarity can be seen at https://github.com/BioMedicalBigDataMiningLab/Disease-Mesh-Similarity

storyandwine commented 3 years ago

Drug-related features matrix are also available at https://github.com/xiangyue9607/SCMFDD/blob/master/SCMFDD_Dataset.mat

shahinghasemi commented 3 years ago

Thank you so much for your references, since my work is based on your work, I've been reading your paper and these are the assumption I have made based on what I read which wasn't present on the code. please correct them if are wrong:

  1. you calculated the drug-drug similarity matrix using drug-related features e.g. targets, pathway, enzymes, substructres and drug-drug interaction.
    • if yes then I'm looking for this data too. e.g. for each of them a matrix of (269*X). because my data should be exactly the same as yours.
    • how did you mix all that similarity matrix before creating the one pair-wise drug-similarity matrix?
  2. you calculated the disease-disease matrix using their MeSH terms based on this method which you also referenced its implementation. where can I find this raw 598*598 matrix?

thanks in advance.

storyandwine commented 3 years ago

https://github.com/xiangyue9607/SCMFDD/blob/master/SCMFDD_Dataset.mat Hi, you can download this .mat file and find all the 269*X and 598*598 matrix. image

Rasia123 commented 2 years ago

@storyandwine such an incredible work......... i was looking through the .mat files but i am unable to view them . is there any alternative way to download them. will be really grateful for such a help

storyandwine commented 2 years ago

@storyandwine such an incredible work......... i was looking through the .mat files but i am unable to view them . is there any alternative way to download them. will be really grateful for such a help

Maybe you should use matlab to open it? Or use scipy.io.loadmat in Python.

Rasia123 commented 2 years ago

Thankyou so much for your quick response

shahinghasemi commented 2 years ago

Hello again, sorry for interruption. this is extracted from the paper:

Based on the above discussion, we adopt the target-based drug–drug similarities calculated by the Jaccard index, MeSH- based disease–disease similarities and drug–disease associa- tions to construct the heterogeneous network and then build the LAGCN models in the following study.

As said, the drug similarity matrix is

  1. based on target feature (presumably is included in the matrix data that you've referenced)
  2. calculated using Jaccard algorithm.

However I calculated the Jaccard similarity of target_feature_matrix of the data that you've referenced but the result matrix is different from the one that you've used. your drug similarity matrix doesn't meet the Jaccard similarity matrix properties for example:

What am I missing?

Thanks in advance.

storyandwine commented 2 years ago

Hello shahinghasemi,

  1. matrix[i, j] are equal to matrix[j, i], for float type simplly use == is not correct, please use cmath.isclose(). For more float type basics, you can see https://docs.microsoft.com/en-us/cpp/c-language/type-float?view=msvc-160.
  2. matrix[i, i] means self-connections, which have been considered in GCN, so there is no need to set matrix[i, i]=1.
  3. When you view the data, the original code for the quick calculation of Jaccard is available in the same folder. For details, https://github.com/xiangyue9607/SCMFDD/blob/master/Experiment_SCMFDD_All.m#L279 Best wishes!
shahinghasemi commented 2 years ago

Hello dear @storyandwine, I was wondering about how can I find drug-drug interaction matrix. the drug-drug interaction in this data is in shape of (269, 2002) which isn't symmetric(269, 269). I want to use it as the adjacency matrix of GCN operators.

storyandwine commented 2 years ago

Hi, There are several ways to get drug-drug interactions. First, you can get them by drugbank database. https://go.drugbank.com/drug-interaction-checker. Second, you can also match the ids in drug-drug interactions dataset. For example: https://github.com/BioMedicalBigDataMiningLab/SFLLN.

shahinghasemi commented 2 years ago

@storyandwine Thanks for the reply. I don't have access to the drugs list. this is the keys included on the matrix data:

dict_keys(['__header__', '__version__', '__globals__', 'target_sequence_list', 'chemical_list', 'disease_list', 'structure_feature_matrix', 'target_feature_matrix', 'enzyme_feature_matrix', 'target_similarity_matrix', 'normalized_dis_similairty_matrix', 'drug_disease_association_matrix', 'pathway_feature_matrix', 'drug_drug_interaction_feature_matrix'])

How can I find the 269 drugs list to send request to find their interactions on drug bank?

storyandwine commented 2 years ago

'chemical_list'