hetio / hetionet

Hetionet: an integrative network of disease
https://neo4j.het.io
257 stars 68 forks source link

Connectivity Search Automated Query Question #53

Closed ayujain04 closed 1 year ago

ayujain04 commented 1 year ago

Hey!

Is there a way for me to query the hetionet to get a sum of the path scores of all metapaths from a source node to a target node? I attached an image of how I can do this with the connectivity search GUI. However, I was wondering if there is a way for me to make many queries where I can get the path count sum, without having to manually enter it into the website each time.

Screenshot 2023-07-11 at 9 15 53 PM Screenshot 2023-07-11 at 9 17 14 PM
ayujain04 commented 1 year ago

Another possibility is: For each (drug, disease) pair of interest, extract the list of meta-paths automatically, as a CSV file?

dhimmel commented 1 year ago

get a sum of the path scores of all metapaths from a source node to a target node

Summing all the path scores is not a metric we have explored. For reference from this manuscript:

The path score equals the proportion of the DWPC contributed by a path multiplied by the magnitude of the DWPC’s p-value (-log10(p)).

Therefore, if you wanted to sum path scores across all metapaths, you don't actually need to know the individual paths. You could sum the -log10(p-value) for each metapath. You could get p-values for metapaths whose significance exceeds the database inclusion threshold via API calls like https://search-api.het.io/v1/metapaths/source/17054/target/6602/ (this is what the webapp uses).

Do you think you are interested in all metapaths (up to a given length) or are some metapaths more interesting for you application?

ayujain04 commented 1 year ago

Okay, that makes sense. Thank you! I am interested in all the metapaths (up to a given length) that are significant enough. Would I be able to use that API call to do that?

dhimmel commented 1 year ago

Yes, you will likely need to get the mapping of Neo4j internal identifiers to persistent disease/compound IDs. You can do that with this Cypher query at https://neo4j.het.io/browser/:

MATCH (node)
WHERE node:Compound OR node:Disease
RETURN id(node) AS id, node.identifier AS identifier, node.name AS name, labels(node)[0] AS type
ORDER BY type, identifier

Then you can use those ids for the API calls above. How many node pairs do you want to do this for? If its a very large number, you might be better off running the queries against the PostgreSQL database directly at search-db.het.io.

ayujain04 commented 1 year ago

Thanks for the response!

For reference: I am trying to build a disease specific hypergraph.

In order to do so, I will need each metapath from every drug in the graph to a specific disease node.

It would be helpful to be able to say query Metformin and Dementia and then get a csv of every metapath from metformin to dementia in a csv file.

Is this possible?

ayujain04 commented 1 year ago

For example in the sample API query that you provided, it provides a list of metapaths from source id node to target id node.

With one such path listed below. However, How can I get the ids of each node in each metapath. I will need this to construct the hypegraph, as each of these nodes will be in a single hyperedge betweeen one drug and one disease.

{
            "id": 72430549,
            "adjusted_p_value": 0.045993636486382335,
            "path_count": 126,
            "dwpc": 4.386227813718969,
            "p_value": 0.0003801126982345647,
            "reversed": false,
            "metapath_abbreviation": "CbGdAlD",
            "metapath_name": "Compound–binds–Gene–downregulates–Anatomy–localizes–Disease",
            "metapath_length": 3,
            "metapath_path_count_density": 0.590437,
            "metapath_path_count_mean": 4.04788,
            "metapath_path_count_max": 372,
            "metapath_dwpc_raw_mean": 0.000121205,
            "metapath_n_similar": 121,
            "metapath_p_threshold": 1.0,
            "metapath_id": "CbGdAlD",
            "metapath_reversed": false,
            "metapath_metaedges": [
                [
                    "Compound",
                    "Gene",
                    "binds",
                    "both"
                ],
                [
                    "Gene",
                    "Anatomy",
                    "downregulates",
                    "both"
                ],
                [
                    "Anatomy",
                    "Disease",
                    "localizes",
                    "both"
                ]
            ],
            "dgp_id": 21763190,
            "dgp_source_degree": 56,
            "dgp_target_degree": 39,
            "dgp_n_dwpcs": 800,
            "dgp_n_nonzero_dwpcs": 791,
            "dgp_nonzero_mean": 2.1106664913501048,
            "dgp_nonzero_sd": 0.5350022943256412,
            "dgp_reversed": false,
            "cypher_query": "MATCH path = (n0:Compound)-[:BINDS_CbG]-(n1)-[:DOWNREGULATES_AdG]-(n2)-[:LOCALIZES_DlA]-(n3:Disease)\nUSING JOIN ON n1\nWHERE n0.identifier = 'DB00331' // Metformin\nAND n3.identifier = 'DOID:1612' // breast cancer\nWITH\n[\nsize((n0)-[:BINDS_CbG]-()),\nsize(()-[:BINDS_CbG]-(n1)),\nsize((n1)-[:DOWNREGULATES_AdG]-()),\nsize(()-[:DOWNREGULATES_AdG]-(n2)),\nsize((n2)-[:LOCALIZES_DlA]-()),\nsize(()-[:LOCALIZES_DlA]-(n3))\n] AS degrees, path\nWITH path, reduce(pdp = 1.0, d in degrees| pdp * d ^ -0.5) AS PDP\nWITH collect({paths: path, PDPs: PDP}) AS data_maps, count(path) AS PC, sum(PDP) AS DWPC\nUNWIND data_maps AS data_map\nWITH data_map.paths AS path, data_map.PDPs AS PDP, PC, DWPC\nRETURN\n  path AS neo4j_path,\n  substring(reduce(s = '', node IN nodes(path)| s + '–' + node.name), 1) AS path,\n  PDP,\n  100 * (PDP / DWPC) AS percent_of_DWPC\nORDER BY percent_of_DWPC DESC\nLIMIT 10"
        },
dhimmel commented 1 year ago

How can I get the ids of each node in each metapath

Terminology correction: metapaths contain metanodes (like Anatomy or Disease) rather than nodes. Actual paths are what contain nodes (e.g. Metformin & Dementia).

I'm not sure about the rest of the question, but search-api.het.io has an endpoint to get the paths for a given source node, target node, and metapath combination.

Regarding the JSON output from the API, a tool like pandas could help you convert it to CSV (CSV and JSON are just different encodings of data).

ayujain04 commented 1 year ago

Sure, so the JSON allowed me to view the meta paths, which is not that helpful for what I am trying to build.

The JSON also provided me with cypher queries from which the result of, if I could download it as a csv, would give me the path of the actual nodes (what I would need to construct this hypergraph).

However, because of the amount of queries, node4j times out. Is there an API that I can call that would return the actual paths?