The current implementation of the singular value decomposition is naive and in most circumstances infeasible to use, since it computes for an input matrix $X$ both, the covariance matrix $X^\top X$ and the Gram matrix (inner products) $XX^\top$.
This defeats the purpose of the SVD for a very important performance optimization use case of PCA.
For PCA of datasets with high dimensionality (e.g. d > 1000, or d >> n) it is quite costly to compute the covariance matrix and then compute the eigen decomposition of it. Instead we can compute the eigen decomposition implicitly through the SVD without computing the covariance matrix. Like so: $Q\Lambda Q^{-1} = X^\top X = (USV^\top)^\top (USV^\top) = V S U^\top U S V^\top = VS^2V^\top$. This means that the SVD $X = USV^\top$ suffices to get the eigen decomposition of $X^\top X$, i.e., $Q=V$ (orthonormal eigenvectors), $\Lambda = S^2$ (diagonal matrix of eigenvalues).
But for this to give a performance benefit, the SVD implemenmtation cannot be based on computing covariance and Gram matrix (since we want to avoid their computation). Instead an implementation such as described by Golub and Reinsch would be needed.
I know that this is not the easiest algorithm to implement and it seems that there already has been an attempt in implementing it. There is also svd-js that solely implements this algorithm in javascript.
I think druidjs would benefit a lot from a decent svd implementation. But maybe its also good enough to use another lib to compute the svd when needed. What do you think?
The current implementation of the singular value decomposition is naive and in most circumstances infeasible to use, since it computes for an input matrix $X$ both, the covariance matrix $X^\top X$ and the Gram matrix (inner products) $XX^\top$. This defeats the purpose of the SVD for a very important performance optimization use case of PCA.
For PCA of datasets with high dimensionality (e.g. d > 1000, or d >> n) it is quite costly to compute the covariance matrix and then compute the eigen decomposition of it. Instead we can compute the eigen decomposition implicitly through the SVD without computing the covariance matrix. Like so: $Q\Lambda Q^{-1} = X^\top X = (USV^\top)^\top (USV^\top) = V S U^\top U S V^\top = VS^2V^\top$. This means that the SVD $X = USV^\top$ suffices to get the eigen decomposition of $X^\top X$, i.e., $Q=V$ (orthonormal eigenvectors), $\Lambda = S^2$ (diagonal matrix of eigenvalues).
But for this to give a performance benefit, the SVD implemenmtation cannot be based on computing covariance and Gram matrix (since we want to avoid their computation). Instead an implementation such as described by Golub and Reinsch would be needed.
I know that this is not the easiest algorithm to implement and it seems that there already has been an attempt in implementing it. There is also svd-js that solely implements this algorithm in javascript.
I think druidjs would benefit a lot from a decent svd implementation. But maybe its also good enough to use another lib to compute the svd when needed. What do you think?