DWWC recursive and chain-ordering

zietzm commented 6 years ago

References #53

Add two new functions for computing DWWC. The notebook shows that both chain ordering and recursion with caching are a good bit faster than our standard implementation. Note that in the notebook, I performed the computations having dwwc_recursive cache using the cacher metric dwpc. This was to make sure that dwwc was not simply calling the cached result of the recursive function, but to include the actual times we would expect when using caching.

Over the Rephetio metapaths, the two new methods took this many seconds (old method for comparison) in sparse format.

DWWC method	Time for all Rephetio (s)
original	219.110172
recursive	93.744713
chain	70.896228

zietzm commented 6 years ago

I just added another chain multiplication function, this time which does not keep all the arrays in memory at the same time, as was mentioned in #53. The two functions appear to have fairly similar performance, with the dwwc_chain_nomem function a tiny bit faster. We can choose however we prefer to do this.

zietzm commented 6 years ago

You are absolutely right, I had forgotten to configure the cache! I will rerun that and compare.

With the cache, dwwc_recursive takes 36 seconds to compute all Rephetio metapath DWWCs, compared to 3 minutes 49 seconds using the original DWWC function and 1 minute 16 seconds using dwwc_chain_nomem.

Quick note of interest: what appeared to be the fastest dwwc method, dwwc_chain_nomem, computed DWWC for all metapaths length <= 4 in almost exactly 11 hours. Will rerun this evening using the recursive function with a properly set up cache.

zietzm commented 6 years ago

So looking through the runtimes, dwwc_recursive is slower than dwwc on about 20% of metapaths. Over the Rephetio paths, though, dwwc_recursive is clearly much faster, taking 90 rather than 225 seconds to compute all DWWCs. I think the reason dwwc_recursive is faster on some metapaths is that, if not using a cache, it keeps more matrices in memory at once. So while the peak memory usage is pretty close for computations using dwwc and dwwc_recursive, the peak memory usage is slightly higher ultimately for dwwc_recursive.

In computing other metapath DWWCs, dwwc_recursive is slower. Here are the top four metapaths with the biggest time difference in favor of dwwc, ie. those in which it was most an improvement over dwwc_recursive. Nothing really stands out to me with these metapaths. @dhimmel perhaps you can notice something with these that would explain the difference in computation times. I suppose there could also be some element of randomness here, as these times are all quite small.

zietzm commented 6 years ago

It appears not to make a big difference which DWWC method is used for DWPC computation. In the notebook I just added, which-dwpc.ipynb, the time required to compute all Rephetio paths was roughly equivalent between all three methods, with dwwc_chain being slightly faster than the other two.

dhimmel commented 6 years ago

It would be nice to see if we can figure out why recursive-nocache is so much faster than dwwc-nocache (which is equivalent here to dwwc since dwwc is not recursive and has no submetapath caching). Using a profiler like line_profiler could help.

Here are some of the pathological examples:

abbrev	dwwc	dwwc-nocache	recursive	recursive-nocache	chain	chain-nocache
CbGeAeGaD	2.630	2.596	0.310	0.352	0.303	0.306
CbGeAeGdD	2.592	2.566	0.135	0.201	0.159	0.161
CbGeAeGuD	2.575	2.738	0.134	0.189	0.209	0.198
CbGdAeGdD	2.089	1.983	0.070	0.118	0.100	0.091
CbGuAeGaD	2.029	2.020	0.192	0.202	0.153	0.141
CbGdAeGaD	2.014	1.987	0.153	0.203	0.142	0.139
CbGuAeGdD	2.011	1.999	0.102	0.109	0.092	0.094

zietzm commented 6 years ago

Just pushed one more commit. Will rerun notebooks (may take a while) and then this PR should be done.

zietzm commented 6 years ago

Closed #53

greenelab / connectivity-search-analyses

DWWC recursive and chain-ordering #112