Closed ayujain04 closed 1 year ago
Another possibility is: For each (drug, disease) pair of interest, extract the list of meta-paths automatically, as a CSV file?
get a sum of the path scores of all metapaths from a source node to a target node
Summing all the path scores is not a metric we have explored. For reference from this manuscript:
The path score equals the proportion of the DWPC contributed by a path multiplied by the magnitude of the DWPC’s p-value (-log10(p)).
Therefore, if you wanted to sum path scores across all metapaths, you don't actually need to know the individual paths. You could sum the -log10(p-value) for each metapath. You could get p-values for metapaths whose significance exceeds the database inclusion threshold via API calls like https://search-api.het.io/v1/metapaths/source/17054/target/6602/ (this is what the webapp uses).
Do you think you are interested in all metapaths (up to a given length) or are some metapaths more interesting for you application?
Okay, that makes sense. Thank you! I am interested in all the metapaths (up to a given length) that are significant enough. Would I be able to use that API call to do that?
Yes, you will likely need to get the mapping of Neo4j internal identifiers to persistent disease/compound IDs. You can do that with this Cypher query at https://neo4j.het.io/browser/:
MATCH (node)
WHERE node:Compound OR node:Disease
RETURN id(node) AS id, node.identifier AS identifier, node.name AS name, labels(node)[0] AS type
ORDER BY type, identifier
Then you can use those ids for the API calls above. How many node pairs do you want to do this for? If its a very large number, you might be better off running the queries against the PostgreSQL database directly at search-db.het.io
.
Thanks for the response!
For reference: I am trying to build a disease specific hypergraph.
In order to do so, I will need each metapath from every drug in the graph to a specific disease node.
It would be helpful to be able to say query Metformin and Dementia and then get a csv of every metapath from metformin to dementia in a csv file.
Is this possible?
For example in the sample API query that you provided, it provides a list of metapaths from source id node to target id node.
With one such path listed below. However, How can I get the ids of each node in each metapath. I will need this to construct the hypegraph, as each of these nodes will be in a single hyperedge betweeen one drug and one disease.
{
"id": 72430549,
"adjusted_p_value": 0.045993636486382335,
"path_count": 126,
"dwpc": 4.386227813718969,
"p_value": 0.0003801126982345647,
"reversed": false,
"metapath_abbreviation": "CbGdAlD",
"metapath_name": "Compound–binds–Gene–downregulates–Anatomy–localizes–Disease",
"metapath_length": 3,
"metapath_path_count_density": 0.590437,
"metapath_path_count_mean": 4.04788,
"metapath_path_count_max": 372,
"metapath_dwpc_raw_mean": 0.000121205,
"metapath_n_similar": 121,
"metapath_p_threshold": 1.0,
"metapath_id": "CbGdAlD",
"metapath_reversed": false,
"metapath_metaedges": [
[
"Compound",
"Gene",
"binds",
"both"
],
[
"Gene",
"Anatomy",
"downregulates",
"both"
],
[
"Anatomy",
"Disease",
"localizes",
"both"
]
],
"dgp_id": 21763190,
"dgp_source_degree": 56,
"dgp_target_degree": 39,
"dgp_n_dwpcs": 800,
"dgp_n_nonzero_dwpcs": 791,
"dgp_nonzero_mean": 2.1106664913501048,
"dgp_nonzero_sd": 0.5350022943256412,
"dgp_reversed": false,
"cypher_query": "MATCH path = (n0:Compound)-[:BINDS_CbG]-(n1)-[:DOWNREGULATES_AdG]-(n2)-[:LOCALIZES_DlA]-(n3:Disease)\nUSING JOIN ON n1\nWHERE n0.identifier = 'DB00331' // Metformin\nAND n3.identifier = 'DOID:1612' // breast cancer\nWITH\n[\nsize((n0)-[:BINDS_CbG]-()),\nsize(()-[:BINDS_CbG]-(n1)),\nsize((n1)-[:DOWNREGULATES_AdG]-()),\nsize(()-[:DOWNREGULATES_AdG]-(n2)),\nsize((n2)-[:LOCALIZES_DlA]-()),\nsize(()-[:LOCALIZES_DlA]-(n3))\n] AS degrees, path\nWITH path, reduce(pdp = 1.0, d in degrees| pdp * d ^ -0.5) AS PDP\nWITH collect({paths: path, PDPs: PDP}) AS data_maps, count(path) AS PC, sum(PDP) AS DWPC\nUNWIND data_maps AS data_map\nWITH data_map.paths AS path, data_map.PDPs AS PDP, PC, DWPC\nRETURN\n path AS neo4j_path,\n substring(reduce(s = '', node IN nodes(path)| s + '–' + node.name), 1) AS path,\n PDP,\n 100 * (PDP / DWPC) AS percent_of_DWPC\nORDER BY percent_of_DWPC DESC\nLIMIT 10"
},
How can I get the ids of each node in each metapath
Terminology correction: metapaths contain metanodes (like Anatomy or Disease) rather than nodes. Actual paths are what contain nodes (e.g. Metformin & Dementia).
I'm not sure about the rest of the question, but search-api.het.io
has an endpoint to get the paths for a given source node, target node, and metapath combination.
Regarding the JSON output from the API, a tool like pandas could help you convert it to CSV (CSV and JSON are just different encodings of data).
Sure, so the JSON allowed me to view the meta paths, which is not that helpful for what I am trying to build.
The JSON also provided me with cypher queries from which the result of, if I could download it as a csv, would give me the path of the actual nodes (what I would need to construct this hypergraph).
However, because of the amount of queries, node4j times out. Is there an API that I can call that would return the actual paths?
Hey!
Is there a way for me to query the hetionet to get a sum of the path scores of all metapaths from a source node to a target node? I attached an image of how I can do this with the connectivity search GUI. However, I was wondering if there is a way for me to make many queries where I can get the path count sum, without having to manually enter it into the website each time.