Closed zietzm closed 6 years ago
I just added another chain multiplication function, this time which does not keep all the arrays in memory at the same time, as was mentioned in #53. The two functions appear to have fairly similar performance, with the dwwc_chain_nomem
function a tiny bit faster. We can choose however we prefer to do this.
You are absolutely right, I had forgotten to configure the cache! I will rerun that and compare.
With the cache, dwwc_recursive
takes 36 seconds to compute all Rephetio metapath DWWCs, compared to 3 minutes 49 seconds using the original DWWC function and 1 minute 16 seconds using dwwc_chain_nomem
.
Quick note of interest: what appeared to be the fastest dwwc method, dwwc_chain_nomem
, computed DWWC for all metapaths length <= 4 in almost exactly 11 hours. Will rerun this evening using the recursive function with a properly set up cache.
So looking through the runtimes, dwwc_recursive
is slower than dwwc
on about 20% of metapaths. Over the Rephetio paths, though, dwwc_recursive
is clearly much faster, taking 90 rather than 225 seconds to compute all DWWCs. I think the reason dwwc_recursive
is faster on some metapaths is that, if not using a cache, it keeps more matrices in memory at once. So while the peak memory usage is pretty close for computations using dwwc
and dwwc_recursive
, the peak memory usage is slightly higher ultimately for dwwc_recursive
.
In computing other metapath DWWCs, dwwc_recursive
is slower. Here are the top four metapaths with the biggest time difference in favor of dwwc
, ie. those in which it was most an improvement over dwwc_recursive
. Nothing really stands out to me with these metapaths. @dhimmel perhaps you can notice something with these that would explain the difference in computation times. I suppose there could also be some element of randomness here, as these times are all quite small.
It appears not to make a big difference which DWWC method is used for DWPC computation. In the notebook I just added, which-dwpc.ipynb
, the time required to compute all Rephetio paths was roughly equivalent between all three methods, with dwwc_chain
being slightly faster than the other two.
It would be nice to see if we can figure out why recursive-nocache is so much faster than dwwc-nocache (which is equivalent here to dwwc since dwwc is not recursive and has no submetapath caching). Using a profiler like line_profiler could help.
Here are some of the pathological examples:
abbrev | dwwc | dwwc-nocache | recursive | recursive-nocache | chain | chain-nocache |
---|---|---|---|---|---|---|
CbGeAeGaD | 2.630 | 2.596 | 0.310 | 0.352 | 0.303 | 0.306 |
CbGeAeGdD | 2.592 | 2.566 | 0.135 | 0.201 | 0.159 | 0.161 |
CbGeAeGuD | 2.575 | 2.738 | 0.134 | 0.189 | 0.209 | 0.198 |
CbGdAeGdD | 2.089 | 1.983 | 0.070 | 0.118 | 0.100 | 0.091 |
CbGuAeGaD | 2.029 | 2.020 | 0.192 | 0.202 | 0.153 | 0.141 |
CbGdAeGaD | 2.014 | 1.987 | 0.153 | 0.203 | 0.142 | 0.139 |
CbGuAeGdD | 2.011 | 1.999 | 0.102 | 0.109 | 0.092 | 0.094 |
Just pushed one more commit. Will rerun notebooks (may take a while) and then this PR should be done.
Closed #53
References #53
Add two new functions for computing DWWC. The notebook shows that both chain ordering and recursion with caching are a good bit faster than our standard implementation. Note that in the notebook, I performed the computations having
dwwc_recursive
cache using the cacher metricdwpc
. This was to make sure thatdwwc
was not simply calling the cached result of the recursive function, but to include the actual times we would expect when using caching.Over the Rephetio metapaths, the two new methods took this many seconds (old method for comparison) in sparse format.