Open ltiao opened 1 year ago
Adding tf.function(...) wrapper to compile the code to XLA speeds up the cholesky_update for this problem size from the 1.38s you saw to around 40-60ms on my machine (nothing fancy).
Increasing the problem size from 1024 to 2048, the timing for XLA-compiled cholesky_update increases to about 240ms (4-6x, sounds about right for quadratic scaling). The XLA-compiled naive approach for this problem size is up to about 1.8 sec for me.
Main takeaway: always jit compile (or, at least, always try with and without and do whatever is fastest!), and also big O hides constants that matter! Depending on your problem size, the naive method may be better -- but asymptotically, quadratic scaling will beat cubic :)
A common problem is to compute the Cholesky factor of
A + u @ u.T
, given a PD matrixA
(shapen x n
) and a rank-1 update vectoru
(shapen
). The obvious and naive way is to directly compute the Choleskly factor ofA + u @ u.T
, which has complexity O(n^3). However, suppose we already have the Cholesky factorL
(shapen x n
) ofA
, then we can use it to compute the Cholesky factor ofA + u @ u.T
in O(n^2) time.My understanding is that this is what tfp.math.cholesky_update is supposed to implement. However, a simple benchmark shows that the supposedly optimized approach is about 85 times slower than the obvious naive approach!
The optimized approach using
tfp.math.cholesky_update
:The obvious naive approach: