greenelab / connectivity-search-backend

Django backend for hetnet connectivity search
https://search-api.het.io
BSD 3-Clause "New" or "Revised" License
6 stars 2 forks source link

Return path count for non-precomputed rows #64

Closed dhimmel closed 5 years ago

dhimmel commented 5 years ago

Currently, when we compute DWPC and path information on the fly, we have not been calculating setting path_count:

https://github.com/greenelab/hetmech-backend/blob/6b00ffe58664db9941e3bf6ce89596e9bc7f9404/dj_hetmech_app/utils/paths.py#L97

The reason for this limitation was that hetnetpy.neo4j.construct_dwpc_query only returned percent_of_DWPC. This is due to a possible Cypher/neo4j limitation where we can't return intermediate values separately from the resulting table.

One solution would be to change the Cypher to repeat PC and DWPC for every path row like:

MATCH path = (n0:Compound)-[:BINDS_CbG]-(n1)-[:PARTICIPATES_GpPW]-(n2)-[:PARTICIPATES_GpPW]-(n3)-[:ASSOCIATES_DaG]-(n4:Disease)
USING JOIN ON n2
WHERE n0.identifier = 'DB01156'
AND n4.identifier = 'DOID:0050742'
AND n1 <> n3
WITH
[
size((n0)-[:BINDS_CbG]-()),
size(()-[:BINDS_CbG]-(n1)),
size((n1)-[:PARTICIPATES_GpPW]-()),
size(()-[:PARTICIPATES_GpPW]-(n2)),
size((n2)-[:PARTICIPATES_GpPW]-()),
size(()-[:PARTICIPATES_GpPW]-(n3)),
size((n3)-[:ASSOCIATES_DaG]-()),
size(()-[:ASSOCIATES_DaG]-(n4))
] AS degrees, path
WITH path, reduce(pdp = 1.0, d in degrees| pdp * d ^ -0.5) AS PDP
WITH collect({paths: path, PDPs: PDP}) AS data_maps, count(path) AS PC, sum(PDP) AS DWPC
UNWIND data_maps AS data_map
WITH data_map.paths AS path, data_map.PDPs AS PDP, PC, DWPC
RETURN
  substring(reduce(s = '', node IN nodes(path)| s + '–' + node.name), 1) AS path,
  PDP,
  100 * (PDP / DWPC) AS percent_of_DWPC,
  PC, DWPC
ORDER BY percent_of_DWPC DESC

Returning a table like:

image