Open ttamttam opened 2 years ago
The implementation of linear algebra functions in Owl_base
modules, which is in pure OCaml, is not well supported yet. You may consider using Dense.Ndarray.D
and Owl_linalg.D
modules instead.
I suppose that the problem is here: https://github.com/owlbarn/owl/blob/78b407c0a5f6938dc09c3dc98ee65ffcdcee64f5/src/base/linalg/owl_base_linalg_generic.ml#L400
In order to have a look at the algorithm, I wanted to look at: https://github.com/owlbarn/owl/blob/78b407c0a5f6938dc09c3dc98ee65ffcdcee64f5/src/base/linalg/owl_base_linalg_generic.ml#L381 but I'm answered that it's a bad request. Maybe, because I would need a member access?
Any hint?
And they later improved it: https://www.researchgate.net/profile/Syed-Shah-132/publication/220337321_An_Efficient_and_Generic_Algorithm_for_Matrix_Inversion/links/53ef499a0cf2711e0c42f05a/An-Efficient-and-Generic-Algorithm-for-Matrix-Inversion.pdf
If it's ok, I'll write a PR.
Best regards
@ttamttam Thanks! A PR to fix this issue would be much appreciated!
In the following, n
is the number of rows of the matrix we want to invert.
The current implementation is taken from this article.
Farooq, A., & Hamid, K. (2010). An Efficient and Simple Algorithm for
Matrix Inversion. International Journal of Technology Diffusion (IJTD),
1(1), 20-27. http://doi.org/10.4018/jtd.2010010102
The number of multiplication/division performed by the algorithm is
exactly n³
and the inversion is performed in-place, involving almost no
allocation.
The current implementation could be improved a little by re-ordering some operations, but it is only an implentation detail.
The main problem is that, as it's stated in the publication, it's not numerically stable: it raises an exception if a diagonal element is null.
The same authors also published an improved version of the algorithm:
Farooq, A., Hamid, K., & Shah, I. A. (2010). An Efficient and Generic
Algorithm for Matrix Inversion. International Journal of Technology
Diffusion (IJTD), 1(2), 36-41. http://doi.org/10.4018/jtd.2010040102
It allows the "selection of pivot randomly in the matrix thus supporting partial and full pivoting".
So, we could improve the implementation by selecting off-diagonal pivots (by instance the value with the max amp on each row). The permutations corresponding to the selected pivots would be accumulated in an int array of size n during the computation, and be used at the end to transpose lines and rows of the obtained matrix.
I'm not sure it is really worth it thought. Because even with this strategy, I think we could reach null pivots and fail to finish the computation.
One solution would be to let the implementation as-is, and just try a more
stable one if it fails. And since it's already written, a good candidate would
be to try linsolve_lu m i
(where i is eye n) if a null pivot is reached. In
fact, the implement LU decomposition already performs an implicit full (or
partial?) pivoting, which leads me to think that it is numerically stable.
Another one would be to always use the linsolve_lu
solution.
Let me know what you would prefer?
1- improve the current implementation and add a backup computation with
a linsolve_lu
based one;
2- or simply replace it with the one-line linsolve_lu
implementation. Note
that according to William H. Press, Teukolsky, S. A., Vetterling, W. T., & Flannery, B. P. (2007). Numerical recipes 3rd edition: The art of scientific computing (3rd ed.). Cambridge University Press., it's computation cost is also around n³
multiplications.
@ttamttam Thanks for the deep dig! I vaguely recall that the current OCaml implementation of inv
was added due to some performance reason. Now that the current implementation may give false result, and the computation complexity is similar, personally I tend to use linsolve_lu
to implement the inv
. It is based on the Numerical Recipes code, and is supposed to be more robust and stable.
@tachukao Calvin, what would be your suggestions about this issue?
@ttamttam thanks for looking at this!
If there's no significant performance gain (sounds to me they're both n^3?), I would also prefer the likely more stable linsolve_lu
implementation.
Is there a problem, or is there something I did not understood?