Inquiry about metapaths from 2017 Paper "Systematic Integration of Biomedical Knowledge Prioritizes Drugs for Repurposing"

hetio / hetionet

Hetionet: an integrative network of disease

https://neo4j.het.io

266 stars 69 forks source link

Inquiry about metapaths from 2017 Paper "Systematic Integration of Biomedical Knowledge Prioritizes Drugs for Repurposing" #58

Closed ferry309 closed 4 months ago

ferry309 commented 1 year ago

Hi, I am a postgraduate studying in the domain adaptation of pre-trained language models. I've been following your work in the realm of biomedical data integration.

I was particularly intrigued by your 2017 paper titled "Systematic Integration of Biomedical Knowledge Prioritizes Drugs for Repurposing." In it, you mentioned that 709 of the 1206 metapaths exhibited a statistically significant AUROC at a false discovery rate cutoff of 5%. However, while trying to replicate some of the results and delve deeper into the open-source data, I was unable to locate these 709 metapaths. Would it be possible for you to provide the specific metapaths and their instance paths? I am keen on further exploring these paths and your assistance would be of great help as I continue my journey in the biomedical domain.

dhimmel commented 1 year ago

Quoting from the manuscript:

Overall, 709 of the 1,206 metapaths exhibited a statistically significant Δ AUROC at a false discovery rate cutoff of 5%. These 709 metapaths included all 24 metaedges, suggesting that each type of relationship we integrated provided at least some therapeutic utility.

I was unable to locate these 709 metapaths.

We have an interactive table of the metapaths here, but it doesn't look like it has the fdr adjusted p-values.

I think the dataset you want is all-features/data/feature-performance/auroc.tsv. We then computed the FDR using the following R command in 6-rvisualize.ipynb:

fdr_delta_auroc = p.adjust(p = pval_delta_auroc, method = 'fdr')

I think we also saved the FDR adjusted p-values in 5-primary-aucs.ipynb to data/feature-performance/primary-aurocs.tsv. If you filter this dataset to feature_type == "dwpc" and fdr_pval_auroc < 0.5, I hope you get 709 rows 😃

ferry309 commented 1 year ago

Thank you very much for your prompt reply.

I have successfully identified 1069 metapaths that meet the above criteria. My next objective is to find the instance paths for these metapaths. From my understanding, and based on the information you've provided, it seems you have generated query statements for each metapath to measure their effectiveness as features. Do you have the instance paths generated during the query process for metapaths?

If these data are not available, would I need to execute the queries individually on Neo4j to retrieve the information for all metapaths? Given that Neo4j in https://neo4j.het.io/ often experiences timeouts, this approach seems somewhat impractical.

Could you advise on the best course of action to obtain these data? Any suggestions or alternative methods you could provide would be immensely helpful.

dhimmel commented 1 year ago

Do you have the instance paths generated during the query process for metapaths?

We do not store actual paths corresponding to source node, target node, metapath combinations. Instead we generate them on the fly via Cypher queries to Neo4j.

When the path count is large, i.e. over 10,000, then I don't suggest trying to generate all paths. I don't see a valid use case for generating such a large number of paths though. When the path count is that large, any individual path tends to be pretty meaningless.

Also noting our recent publication Hetnet connectivity search provides rapid insights into how two biomedical entities are related.

ferry309 commented 1 year ago

Thanks a lot! I also have a question about the undirected metaedges in the paper. You mentioned it in the last sentence of the first paragraph on page 7: "Note that all metaedges besides Gene->regulates->Gene are undirected." Take Anatomy–upregulates–Gene as an example, we can not say Gene–>upregulates–>Anatomy but Anatomy-> was upregulated->Gene. Isn't this just a directed edge?

dhimmel commented 1 year ago

question about the undirected metaedges in the paper

See related issue https://github.com/hetio/hetionet/issues/23.

Whether a metaedge/edge is directional or symmetric is a distinction that is most relevant when the source and target metanode are the same. When there are different source and target metanodes, we encoded "directionality" as different metaedges like:

Anatomy–upregulates–Gene
Anatomy–downregulates–Gene

ferry309 commented 1 year ago

You mean you use different edges between the same node pair to express the directionality. However, the entity pair, Compound and Disease, do not have different edges to represent the direction, but the same edge is used to represent the reverse direction in the metapath, e.g., Compound–palliates–Disease–palliates–Compound-treat-Disease. So I'm confused about how to distinguish the direction, or whether all edges in the meta-knowledge graph are bidirectional, even for Anatomy–upregulates–Gene and Anatomy–downregulates–Gene.

dhimmel commented 11 months ago

Compound–palliates–Disease and Disease–palliates–Compound are the same edge type, just with different orientations. There is no difference in the semantic meaning between the two, which is why we consider the bipartite edges in Hetionet as bidirectional.