Open cdeterman opened 8 years ago
Yes, currently there is only an LU factorization, no Cholesky factorization. It should be added, yes, but I don't have any particular plans. Contributions welcome :-)
Hi, I have a blocked Cholesky implementation in OpenCL that outperforms your existing LU factorization with a speedup of 2-5, depending on the GPU. Would you be interested in my contribution?
Kind regards, Rok
Hi, sure, all contributions are welcome! :-) Doall yourblocks need to be the same size? Does it work well across many different block sizes?
Maybe the term block cholesky is not the best. My implementation is inspired by this ( https://www.thinkmind.org/download.php?articleid=iccgi_2013_11_30_10133 ) and some other papers. It basically recursively solves the decomposition as described in these 6 steps:
It may not look promising at first glance, but it turns out that is. The speedup does depend on the block size (it does require trying some tuning - manually trying some options). But generally it performs well for almost any block size.
Ok, I see. This 'blocked' algorithm is actually similar to how our LU factorization is implemented (and similar to how most other libraries implement it).
Ok. If you think this would be of any help to you, I would gladly contribute my work. Do you have any guidelines for adding features or should I just try to replicate the style in which your other features are implemented?
Here's a guideline on coding style: http://viennastar.iue.tuwien.ac.at/wiki/doku.php?id=codingstyle I recommend creating a pull request once your contribution is ready for review. If you could also provide a CUDA- and OpenMP-version of your factorization, it would definitely help - but it's not mandatory.
Great, thanks!
I know there is the Incomplete Cholesky Factorization, which I understand is a 'sparse' approximation of cholesky, but is there currently an existing or planned implementation of a full Cholesky Factorization of dense matrices?