Closed lorenzo2beretta closed 5 months ago
Hi Lorenzo,
Thanks for the input (and for using Matfree in the first place)!
Your suggestion sounds good to me. Would you be willing to implement this change yourself by making a pull request? I could only get to this in the next week at the earliest.
If so, it would be great if the PR description included a speed comparison of before vs after -- nothing special -- only a quick confirmation that the new approach is indeed faster.
What do you think?
Also, don't be surprised: I retitled the issue to reflect your type of request more closely. I hope you don't mind :)
Absolutely, thanks for retitling and for answering so promptly :) That was the very first Github issue I ever raised, I apologize if it was in bad shape.
I will be able to work on this after ICML deadline. You'll see my pull request at some point after that.
Awesome that this was your first GitHub issue -- please keep them coming! :v:
I will be able to work on this after ICML deadline. You'll see my pull request at some point after that.
Great! Feel free to let me know if you have any questions :)
I am using matfree for large-scale neural network algorithms and I was expecting the running time of your Lanczos implementation to be dominated by the term
lanczos_iterations * Time(matrix-vector product).
However, the algorithm is much slower than that and I suspect this is due to a suboptimal implementation of of the re-orthogonalization process. The current implementation sequentially scans through the current orthonormal basis of the Krylov space and orthogonalizes the current vector w.r.t. each of them, sequentially. In practice, this is achieved using lax.scan.
I suggest an alternate, data parallel, implementation. Let B be a n x k matrix where columns form an orthonormal basis of the current Krylov space. Let v be the vector to be orthogonalized. We compute
v_ort = v - B (B^T v)
where ^T is the transpose operator. Parentheses underscore the order in which operations are executed.
Cheers, Lorenzo