greenelab / connectivity-search-analyses

hetnet connectivity search research notebooks (previously hetmech)
BSD 3-Clause "New" or "Revised" License
9 stars 5 forks source link

Implement Degree Weighted Path Count (DWPC) #20

Closed kkloste closed 7 years ago

kkloste commented 7 years ago

The diffusion code we have in https://github.com/greenelab/hetmech/blob/e387eee580cfe3d530a21c92df025afd9f668784/hetmech/diffusion.py can be made to compute the DWPC for an input metapath. Implement this functionality -- @dhimmel in which module should we put this function?

dhimmel commented 7 years ago

which module should we put this function?

I think it could go in diffusion.py. Or you can make a new module path_count.py.

We will also want a basic path_count function that will count the number of paths without any degree normalization.

For reference, here are existing implementations of DWPC:

For now, let's ignore the issue of duplicate nodes in a path.

kkloste commented 7 years ago

A note for the sake of technical correctness: a path with duplicate nodes is called a "trail", and is not a "path".

Also: if the trail includes duplicate edges, it is technically called a "walk", and is not a trail or a path any more.

kkloste commented 7 years ago

The current diffusion.py functionality already enables counting the number of walks between any source node and any target node -- simply call diffusion.diffuse_along_metapath with row_damping = 0 and column_damping = 0.

dhimmel commented 7 years ago

The current diffusion.py functionality already enables counting the number of walks between any source node and any target node

Cool. diffusion.diffuse_along_metapath is very flexible.

A note for the sake of technical correctness: a path with duplicate nodes is called a "trail", and is not a "path".

So should we call what we're calculating DWWC (degree weighted walk count)?

kkloste commented 7 years ago

So should we call what we're calculating DWWC (degree weighted walk count)?

I think yes, otherwise the name is misleading. It is worth noting, though, that DWWC = DWPC if the metapath does not repeat node-types.

kkloste commented 7 years ago

@dhimmel -- given that diffuse_along_metapath implements DWPC exactly in the case that no node-types are repeated, do you still want a separate function that implements DWPC in the case that there are repeated node-types?

dhimmel commented 7 years ago

I think we should change the name of the diffuse_along_metapath function to dwwc_traverse or similar (suggestions welcome). We can always wrap the function with hardcoded variable for the special cases.

do you still want a separate function that implements DWPC in the case that there are repeated node-types?

That would be fanstastic, but I was under the impression that this would be a substantial engineering task. Ideally, we would exclude walks (and also probably trails) for hetmech. I just don't see them contributing useful information. How feasible is this? What will the computational penalty be?

kkloste commented 7 years ago

I agree with the name change. We can then have some function named diffuse call the function dwwc_along_metapath with hardcoded parameter values, as you suggested.

The general task of "compute all paths" would be very expensive, but that is a much broader task than what we are considering. I feel pretty sure (but not yet 100%) that we could accomplish this with only a small constant factor penalty (3x more time, total?).

This depends on the number of times a node-type is repeated, and the number of different node-types that are repeated. In addition to the computational complexity depending on those items, the complexity of the code will also depend on those items.

kkloste commented 7 years ago

I just finished some analysis, and can confirm: assuming exactly one node-type gets repeated exactly one time, then we can compute the DWPC in no more than 2x the time, using only slightly more memory, compared to DWWC.

I could probably implement that (repeating no more than 1 node-type, one time) pretty soon (end of the week?). If you want something more complicated than that, it might take me a while.

dhimmel commented 7 years ago

I agree with the name change. We can then have some function named diffuse call the function dwwc_along_metapath with hardcoded parameter values, as you suggested.

Okay, after we merge #23, let's do the rename.

assuming exactly one node-type gets repeated exactly one time, then we can compute the DWPC in no more than 2x the time, using only slightly more memory, compared to DWWC.

That's fantastic. This could revolutionize the usability of DWPCs and allow us to budget more computation for permuted networks. The single metanode duplication case will cover lot's of metapaths. A general solution would be the best, but I'm guessing that solving the easy case, will make it easier to solve the general case. Let me know if it makes sense to brainstorm before implementing this, or instead whether you should just jump right in.

dhimmel commented 7 years ago

FYI #28 implemented DWWC (degree-weighted walk count).

For me to do: make a template notebook with a DWPC challenge for @kkloste.