greenelab / connectivity-search-analyses

hetnet connectivity search research notebooks (previously hetmech)
BSD 3-Clause "New" or "Revised" License
9 stars 5 forks source link

Alternatives to XSwap for hetnet permutation #134

Closed dhimmel closed 4 years ago

dhimmel commented 6 years ago

We had a hetmech conference call today with @bdsullivan @cgreene @zietzm @kkloste and Drew van der Poel. @bdsullivan brought up the possibilities of using other methods than XSwap to create permuted hetnets. In the past, we've discussed implementations of XSwap in https://github.com/greenelab/hetmech/issues/46 but have not evaluated completed different methods of generating permuted networks.

Here is the description of our current method from the Project Rephetio manuscript:

From Hetionet, we derived five permuted hetnets (Himmelstein, 2016b). The permutations preserve node degree but eliminate edge specificity by employing an algorithm called XSwap to randomly swap edges (Hanhijärvi et al., 2009). To extend XSwap to hetnets (Himmelstein and Baranzini, 2015a), we permuted each metaedge separately, so that edges were only swapped with other edges of the same type. We adopted a Markov chain approach, whereby the first permuted hetnet was generated from Hetionet v1.0, the second permuted hetnet was generated from the first, and so on. For each metaedge, we assessed the percent of edges unchanged as the algorithm progressed to ensure that a sufficient number of swaps had been performed to randomize the network (Himmelstein, 2016b). Permuted hetnets are useful for computing the baseline performance of meaningless edges while preserving node degree (Himmelstein, 2015l).

Currently, our HetMat data structure has a function for generating permuted hetnets using XSwap, which delegates to hetmech.matrix.permute_matrix, which delegates to hetio.permute.permute_pair_list. We've now generated 200 permuted hetnets derived from Hetionet v1.0.

@bdsullivan mentioned some other methods for generating permuted graphs. My notes include random-graph model, configuration model, and Chung Lu. So if anyone wants to suggest an alternative and how it would differ in terms of output from XSwap, that would be a good starting point.

bdsullivan commented 6 years ago

(below is copied over from Slack) @kkloste [10:00 AM, July 26] @dhimmel -- it was my impression from our conversation that the "permuted networks" that you make are not simply graphs with a specified degree sequence, but have a bit more structure. More specifically, it seems like (to use a simplified example) the graph has several partitions ( A, B and C, say), and you want the permuted network to have the same number of nodes in each partition, as well as the same "between-partition-degree".

To clarify: if node v in A has 3 neighbors in A, 5 neighbors in B, and 7 neighbors in C, is it the case that you want every permuted network to have node v with 3 neighbors in A, 5 in B, 7 in C?

If yes, then the vanilla configuration model is not what we want here.

@dhimmel [10:49 AM] @kkloste for a given metaedge (type of edge), the edge swaps are performed without any sort of partitioning to retain community structure. Each metaedge is permuted separately and when we refer to degree, we mean degree for a specific metaedge. As such, to create a permuted hetnet, we are actually calling permute_pair_list 24 times (once for each metaedge).

kkloste commented 6 years ago

Ok, I think I'm starting to understand, @dhimmel . But I want to make sure I get what permute_pair_list is doing. Can you correct me if this understanding is wrong?

You start by specifying a particular metaedge, e.g. "A to B".

"Permuting the metaedge" means that we look at all the actual edges in "A to B", and for each edge, potentially swap its endpoints with those of other edges.

"Preserving degree" then means that when you permute within metaedge "A to B", each node v in A should have the same number of neighbors in B when the shuffling is done (compared with the number of neighbors v had in B prior to shuffling). And the same is true of each node in B.

Is this accurat @dhimmel ?

dhimmel commented 6 years ago

I want to make sure I get what permute_pair_list is doing.

Might as well kill :bird: :bird: with one :gem:, so I opened https://github.com/hetio/hetio/pull/24 to add docstrings to the hetio.permute functions. @kkloste would be happy if you reviewed that PR.

"Preserving degree" then means that when you permute within metaedge "A to B", each node v in A should have the same number of neighbors in B when the shuffling is done (compared with the number of neighbors v had in B prior to shuffling). And the same is true of each node in B.

Yes. There is an extra qualification that must be added however. The edge swaps are performed separately for each metaedge. Hence, you could have a situation where there are two types of edges between metanode A and metanode B. Let's assume there is AxB and AyB where x and y are different edge types. In this case, each A or B node has an x degree and a y degree. Both of these degrees are preserved, because edge sets of different types are swapped completely independently.

"Permuting the metaedge" means that we look at all the actual edges in "A to B", and for each edge, potentially swap its endpoints with those of other edges.

Applying the same qualification as above, then yes. The algorithm proceeds by randomly picking two edges (of the same type) and attempting a swap. Sometimes a swap is invalid (for example it could result in an existing edge) and hence is skipped. This process occurs iteratively many times where two edges are selected and a swap of the endpoints is attempted.

kkloste commented 6 years ago

Thanks for the clarifications @dhimmel , this helps immensely! Happy to review the PR.

kkloste commented 6 years ago

@dhimmel Ok, finally had the chance to finish my analysis of the Xswap algorithm, regarding how well the procedure samples uniformly at random from the space of graphs that match our desired structure (namely, having the exact degree sequence of the dataset for which we're trying to compute null distribution values).

One complication is that we want a specific, fixed degree sequence (not an expected degree distribution like the Erdos-Renyi or Chung-Lu models), and that makes it a bit difficult to see what "the space of graphs matching our desired structure" looks like.

Say we are looking at just meta-edge "A to B", so our graph is essentially bipartite (A nodes and B nodes), and each node has a specified target degree value, i.e. u in A must have degree d(u). Let m be the total number of edges, and note that since this graph is bipartite we have

m = sum_{u in A} d(u) = sum_{b in B} d(v)

formula-1

A Chung-Lu model of this graph would assign each edge (u,v) the probability

P(u,v) = d(u)*d(v)/m.

formula-2

In contrast, with the Xswap procedure, if carried out an arbitrarily large number of times, the probability of edge (u,v) being present approaches

P(u,v) = d(u)*d(v)/( m + d(u)*d(v) - d(u)-d(v)+1 )

formula-3

which is slightly less than for Chung-Lu.

I'm not saying this means "Xswap is wrong", rather I believe Xswap is sampling from a different space of graphs.

That said, it's really unclear to me how to reason about "sampling uniformly at random from the space of graphs with the exact specified degree sequence" --- I don't think it's possible to count the number of graphs in that space (in a simple, closed formula kind of way). Meaning our current target graph space seems to me to be too difficult to reason about analytically.

Furthermore, I also am not convinced that this is really the space of graphs we want to be sampling from --- since data is incomplete and has errors, do we want to enforce an exact degree sequence? Or just a degree distribution that has the observed degree sequence as its expected outcome?

The good news is if we switch to sampling from "graphs with an expected degree sequence", then we can compute the "null distribution" values deterministically, without any xswap procedure, for all metapaths, in a pretty straight-forward manner.

dhimmel commented 6 years ago

Analytically deriving the probability that an edge exists in a permuted network

Thanks @kkloste for your investigation. In this comment I want to address the probability of an edge existing in a permuted network. I'll address everything else in another comment.

Specifically, in Project Rephetio and @danich1's current edge prediction work, it is helpful to know the "prior probability" of an edge, which we define as the probability it exists in an XSwap-derived permutation. In Project Rephetio, we wanted to estimate the prior probability of treatment (Compound-Disease edges) for the reasons described here. We ended up empirically calculating this probability by generating many XSwap bipartite treatment networks and seeing how often edges of a specific source-target degree combination existed (table).

@antoine-lizee had initially proposed an analytical method for computing the XSwap prior probability, but we realized it was wrong because the probabilities didn't sum the 755 (the actual number of treatment edges). In subsequent discussion, @antoine-lizee wrote:

To grasp the complexity of the problem, we can see that the prior probability for one particular couple Ci-Dj is not only dependent on the degrees of the two nodes Ci and Dj, but on the degrees of all the Compound and Disease nodes. As a result, it is complicated (maybe impossible) to derive an exact analytical solution

Hence, we abandoned the idea of an analytical solution to the XSwap prior, although it would be useful (not necessarily for hetmech but more generally). At this point, we have three analytical solutions:

  1. Lizee: @antoine-lizee's prior, which we know is improperly scaled. The Lizee estimate is defined as source_degree * target_degree / (n_sources * n_targets)
  2. Chung-Lu: as provided by @kkloste and not an estimate of the XSwap prior. Defined as source_degree * target_degree / n_edges.
  3. Kloster: @kkloste's analytical estimate of the XSwap prior defined as source_degree * target_degree / (n_edges + source_degree * target_degree - source_degree - target_degree + 1). FYI @zietzm is curious about how @kkloste derived this!

I compared all three analytic estimates to the empirical probabilities computed for Project Rephetio (notebook). The analytic values are on the x-axis and the empirc values are on the y-axis (both values have been logit transformed):

estimated-xswap-priors

A perfect analytic solution would scatter around the line y = x. The scaling issue with the Lizee method is immediately apparent. Chung-Lu estimates the prior for low degree pairs, but has issues with higher degree nodes. In fact, the formula produces values over 100% for several of the higher degree node pairs. The Kloster solution is quite good but appears to slightly underestimate the priors for higher-degree combinations.

kkloste commented 6 years ago

@dhimmel Thanks for the wonderful summary.

I should have clarified that the probability I gave is asymptotic: it is the probability of a particular node pair (u,v) being connected by an edge after a sufficiently large number of iterations of xswap are applied to a given graph.

I'm attaching my analysis (see section 1.3 of xswap-analysis.pdf). I had originally derived the exact same formula as Lizee, but discovered a problem similar to the one you described, Daniel. @zietzm please let me know if I can make anything in the write-up clearer. I am not 100% certain it is correct, but I cannot recognize any errors.

Whatever the cause of the discrepancy is between my formula and the observed probability, there seems to be a similar gap in the Lizee plot -- toward the top of both plots the blocks dots bend a little away from the red line. I wonder what's causing this discrepancy (apparent in Chung-Lu, too) at high degree nodes?

I don't think the number of edges or size of node degree is large enough to cause numerical issues in my formula. One possibility is that the number of iterations of xswap is not high enough for the observed to my asymptotic bound --- how many iterations of xswap were applied, @dhimmel? I tried determining that from the notebook, but couldn't figure it out.

dhimmel commented 6 years ago

how many iterations of xswap were applied

The empirical XSwap probabilities were calculated from 744,975 permuted networks (description, notebook). For most degree-combinations, degree grouping will result in more than 744,975 observations of edge presence contributing to the empiric average. However, in general, degree grouping provides less of a sample size increase for high degree nodes (since are fewer other nodes with the same degree).

I updated the notebook (see cell 7) to show the degree pairs with the highest XSwap empiric probabilities:

compound_treats disease_treats xswap_prior Chung-Lu Kloster
19 68 79.6% 171.1% 65.9%
17 68 76.9% 153.1% 63.3%
15 68 73.4% 135.1% 60.2%
19 51 73.4% 128.3% 58.5%
14 68 71.5% 126.1% 58.5%
17 51 69.8% 114.8% 55.8%
13 68 69.4% 117.1% 56.7%
12 68 67.0% 108.1% 54.7%
15 51 65.9% 101.3% 52.6%
19 37 65.2% 93.1% 50.1%

Note the Chung-Lu probabilities over 100%. Not sure what to make of these, besides that Chung-Lu cannot handle these high degree pairs? The XSwap empiric and Kloster probabilities are different enough (e.g. 79.6% versus 65.9%) that I do think there is some fundamental difference between them. Not sure it's worth our time to chase it down however.

I should have clarified that the probability I gave is asymptotic

I do think our iteration number should be sufficiently large. We could see if this discrepency occurs similarly for other edges. Let me check with @danich1 who has generated XSwap probabilities for Disease-associates-Gene edges.

dhimmel commented 6 years ago

I expanded the XSwap-versus-analytical plot to include the Disease-associates-Gene metaedge (notebook). I removed the Lizee panel, since I didn't feel it was offering anything valuable. I also removed the logit transformation:

xswap-versus-analytic-priors

The revised plot indicates that the Kloster method does seem to systematically underestimate probabilities for high-degree node pairs. The Chung-Lu estimate continues to produce probabilities over 100%, which makes me wonder we have the right analytical formula for this model?

kkloste commented 6 years ago

@dhimmel Hm, this is strange indeed.

Those are the possibilities I can think of. Any other ideas?

kkloste commented 6 years ago

@dhimmel @zietzm I finally found a flaw in my derivation last week, and recently finished deriving and verifying a new formula that I feel pretty confident is correct. I also ran my own experiments to check correctness, and obtained pretty good correlation. Derivation attached.

The corrected formula is:

$P( u,v ) = \frac{d_u d_v}{ \binom{m}{2} - \sum_{v \in V} \binom{d_u}{2}  - m + d_u + d_v + d_ud_v - 1 } $

codecogseqn

Note that my modified Chung-Lu formula was also incorrect, because what we're doing is fundamentally different from Chung-Lu (CL flips a coin for every potential edge, which does not guarantee that $m$ edges exist, whereas we guarantee that $m$ edges always exist. I tried accounting for this when I modified CL, but I did so incorrectly).

xswap-analysis.pdf

dhimmel commented 6 years ago

@kkloste super excited to try out the new formula! I played around a bit in this notebook in regards to computing the number of wedges (the summation term in the denominator.

If I understand, the summation refers to going through each source node and identifying the degree. Then doing a degree-choose-2 operation, and summing this value for every source node. In other words, this wedge number is computed once from the node degree sequence of source or target nodes.

See in the notebook that doing this with source nodes and target nodes gives different values. Does that make sense? Since the rest of the formula is symmetric between source and target nodes, I'm assuming that the wedge term should be the same for source nodes and target nodes?

kkloste commented 6 years ago

The summation iterates over all vertices (not just source vertices), and for each vertex $u$ adds (degree($u$) choose 2) to the summation.

The reason why this summation gives the desired value is: For each vertex $u$, if you pick any two neighbors $n_1, n_2$, then vertices $n_1, u, n_2$ form a wedge. Thus, every choice of 2 vertices from N(u) gives a distinct wedge. Moreover, every single wedge in the graph is accounted for in this way.

And yes, given a degree sequence you can compute this summation one time, and never have to re-compute it again :-)

zietzm commented 6 years ago

@kkloste I have been having some trouble with the latest method update. I really like the wedge element and the way that this counts things, but for some reason I have been unable to show reasonable results. When I try to apply the method to real data, the combination (m 2) term totally dominates all other terms in the denominator and all probabilities are very low. I haven't been able to identify what might be causing the error, but I hope to look more closely in the upcoming days.

This figure shows what I mean. It compares your original method with the latest update.

image

I believe that I have a method that will work. However, it depends on the correctness of p and q as defined in the two PDFs. This method does not achieve perfection, but I think it is sound, conditional on the definitions of p and q. I hope to write it up in the next few days and post a PDF here. For now, here is a plot and the formula I used.

image

P_uv = du * dv * ((du * dv)**2 + (m - du - dv + 1)**2) ** (-0.5)
kkloste commented 6 years ago

@zietzm I ran into the same problem when I was testing my analysis, but I believe I determined the problem. The discrepancy is because the probability formula is for a single edge existing, whereas the observed frequencies that we're plotting are taken from $m$ edges.

We have thought that we were plotting "observed frequency vs analytic probability". Instead, we should be thinking of it as "number of observed edges vs expected number of edges".

Let N be he number of graphs sampled. For each sampled graph, we see m edges. So we observe m*N total edges. If we multiply the x values by m*N that should balance things out.

dhimmel commented 6 years ago

@kkloste is there any way we can reformulate the formula so it gives the expected probability of a specific edge existing in an XSwapped network? This is the metric that is most useful for our downstream applications.

If we multiply the x values by m*N that should balance things out.

Perhaps this is the required adjustment, but I'm not sure what m is?

kkloste commented 6 years ago

@dhimmel m is the (fixed) number of edges in the graph.

I've been working on a response to your question. The simple answer should be "yes", if we just divide the y-values by N*m instead of multiplying the x-values by N*m, then the plot would be "analytic probability vs observed probability" instead of frequency.

Here are plots from a synthetic network I made and ran xswap on. After generating one initial graph with a fixed number of edges, m, I then perform N=1000 trials of the following: run xswap 1000 times on the graph. Record the edges present. Repeat.

Then, the observed probability of an edge appearing is the # times the edge appeared / (N * m)

Plotting this against the formula yields this:

probability-formula

The values plotted are all between 0 and 1 (i.e. valid probabilities, unlike my erroneous modified-Chung-Lu from before), and that the correlation seems very strong, but the scaling is for some reason off.

Re-scaling required roughly multiplying by 2, which makes me wonder if I missed a factor of 2 in the formula. (It's possible that, since each instance of xswap technically moves 2 edges instead of 1, I'm failing to account for that?). It yields this: probability-formula-scaled

The initial graph used is only a 75-by-100 randomly generated bipartite graph. I've done this on a handful of smallish graphs (50-by-75, 100-by-125) with the same results. (The same very strong linear correlation between the probability formula and observation.)

One thing I can't explain -- the scaling factor (which was 2 for this graph) seems to change a bit with the number of xswaps used. The relationship between the two quantities always appears to be very strongly linear, which suggests the meat of the formula is correct.

But the fact that the scaling varies perplexes me.

kkloste commented 6 years ago

I may have found a bug in my code that will explain the variation in the scaling coefficient. It looks like it might just be a factor of 2 (and the variation was caused by a bug). Will report back once I've run more experiments with the corrected code.

kkloste commented 6 years ago

Ok, confirming that with the bug corrected, all experiments yield data that looks right. It looks like I was missing a factor of (1/2) from the probability formula, and this corrected formula is behaving well now across all experiments.

probability-formula-n1000-s1000

kkloste commented 6 years ago

By the way, the black dots represent edges that were present in the initial graph; the red dots are edges from all other graphs. It is an obsolete feature left over from when I was checking there was no obvious bias from the input graph.

zietzm commented 6 years ago

@kkloste Using your definitions of p and q, I get the following probability of an edge: image See the derivation here -> xswap.pdf.

I have been working with your formula, but am having trouble getting well-fitting results. To clarify, is your method accurately summed-up by the following: (from above PDF)

image

If so, scaling by m for easier comparison with my method and your previous method, I get the following plot, showing your original method, your updated method (scaled), and the method I describe in the PDF.

image

I definitely feel that something is incorrect in my implementation of your method given the striking nonlinearity of the new method.

kkloste commented 6 years ago

@zietzm Thanks for writing that up. I glanced at it, and it seems interesting, but I'm preoccupied with a deadline this week and will have to wait until next week to respond thoughtfully.

To answer your question, I'm not 100% sure your explanation of my method is quite right; I think it is, but just to be totally clear: the quantity N for me is not the number of permutations, but the number of distinct graphs (where each graph is produced by performing a certain number of xswap permutations on an initial graph). I think that's what you meant, but just double checking for clarity. That detail aside, your description (as I understand it) matches my method.

Some thoughts:

"Old Kloster" should be omitted entirely, it's definitely very incorrect (despite the nice correlation is somehow has).

There seems to be something going on near x=0.5 in the plot for both of us -- a kind of inflection point? And that might have something to do with the quantity $d_u d_v$ getting very large? Which would only happen if there are very large degrees in the graph; which is more likely to happen if there are a whole lot of edges in the graph --- It's possible that the plots I showed (where my formula had a clear linear relationship with observation) were for graphs with such low edge density that the terms $d_u d_v$ never got large enough for me to notice the problem.

Can you tell me what the edge density is in the graph(s) you're testing on, @zietzm ?

zietzm commented 6 years ago

In an example graph for DpS:

image

There are 3357 edges, with 137 source nodes and 438 target nodes. Here is a distribution of degrees.

image

Here is another example, DlA with 137x402 nodes and 3602 edges

image

image

kkloste commented 6 years ago

Thanks for that info, @zietzm . The edge density looks to be about 0.065 on these graphs. In contrast, the graphs I worked on had edge densities 0.1 and 0.2, so my "higher edge density" theory doesn't quite explain it.

Hm, what are the dots near y=0 in these plots?