styvon / cPCG

R package for solving system of linear equations using (preconditioned) conjugate gradient algorithm, with improved efficiency using C++ Armadillo linear algebra library, and flexibility for user-specified preconditioning method.
GNU General Public License v3.0
10 stars 1 forks source link

Allow initial coefficient values (vector x) to be passed as an argument? #1

Open tomwenseleers opened 5 years ago

tomwenseleers commented 5 years ago

I was just wondering whether it would be a lot of trouble to allow the initial coefficient estimates x to be passed as an argument x in the functions cgsolve, pcgsolve & the two sparse versions, and to use as default rep(0,ncol(A)) ? It' just that I am dealing with an application in which I already have reasonable initial estimates available, so I would expect a speedup if I started from there...

styvon commented 5 years ago

Hi Tom, thanks for the suggestion! Will work on this one in the next release.

tomwenseleers commented 4 years ago

Many thanks for that! I just tried adding it, and it seems that timings did not go down quite as much as I had hoped (they only decreased by 10-20% for problem sizes with n=200 and p=100 or n=2000 and p=1000) - i.e. even if I passed as initial values a pre-calculated solution, it would still take more time than I thought to return an updated solution. Would you happen to know where the bottleneck could be? I had thought it should then give a solution near-instantaneously, but maybe I'm missing something... Or is there anything else I could pass between two subsequent cgsolvefits, aside from just the coefficient starting values? PS Checking further it seems that the bottleneck was that I was using cgsolveto solve a ridge regression, using

beta_cgsolve <- cgsolve(A = crossprod(X)+lambda*diag(ncol(X)), b = crossprod(X,y),
                                       start = beta_solve, tol = 1e-6, maxIter = 1000)

and the main bottleneck of course is then the calculation of A = crossprod(X)+lambda*diag(ncol(X)). I was now trying with Eigen::LeastSquaresConjugateGradient in Rcpp together with solveWithGuessto pass initial values (https://eigen.tuxfamily.org/dox/classEigen_1_1LeastSquaresConjugateGradient.html), which has the advantage that it doesn't require forming crossprod(X) - I just need to row augment my covariate matrix X with sqrt(lambda)*diag(ncol(X)) first to solve the ridge regression problem...

styvon commented 4 years ago

How does your symmetric and positive-definite matrix A look like? If it has a large condition number then the algorithm might take long to converge. Have you tried using preconditioners other than Jacobi?

tomwenseleers commented 4 years ago

The condition number is 29.52542, so that should be OK right? It's not sparse and it looks like this: Rplot

cgsolve doesn't have any options to specify other preconditioners right? For pcgsolve Jacobi was the fastest for my problem...