spacedome / FEASTSolver.jl

FEAST eigensolver implementation in Julia
MIT License
6 stars 2 forks source link

No method error in gen_feast! #3

Open rleegates opened 4 years ago

rleegates commented 4 years ago

First of all, thank you for the great package!

This line gives me a no method error with the arguments being sparse: https://github.com/spacedome/FEASTSolver.jl/blob/ed5470aee37dd09a65e1919264efd2d3528e06ea/src/feast.jl#L122

spacedome commented 4 years ago

Thanks, and thanks for the bug report!

I am working on a large refactor at the moment, to move from this being my research code to more of an actual library. This should improve sparse matrix support, as it is working already for the nonlinear methods.

As of now, the gen_feast! method may not work to begin with, I'll add a test/example. The feast! method should be fine (though only for standard problems). I have not tested sparse matrices, since when I wrote the linear feast methods the IterativeSolvers.jl package had some issues for my use case. Now enough of them are fixed and I have gotten it to work, but I am unsure on performance. I may eventually have it use an iterative solver by default when called with a sparse argument, right now it would use a direct solve.

I probably will rewrite the method entirely, I'll leave this issue open until it works on sparse matrices.

rleegates commented 4 years ago

Is there something that I can do to contribute to the generalized version?

With respect to the solver: as a direct solver, I have had varying mileage with UMFPACK, I would generally discourage its use for anything substantial. While I see the need for iterative solvers, in the midsize regime, let's say up to 20M, a direct solver may still be competitive. My current approach to linear systems in julia is MUMPS3.jl in double precision up to about 1M, and MUMPS in single precision out-of-core as a preconditioner to an iterative solver. This appears to work quite well. Although quite simple, I could also contribute these code-snippets.

spacedome commented 4 years ago

Thank you! I am refactoring the code currently (will likely be one large commit unfortunately to keep everything mostly working). I should be able to fix the sparse issue, if I run into trouble I will let you know.

That is very helpful to know, the linear system solve is the main computational bottleneck of the algorithm. I have only recently started using sparse with this code. The cutoff point for when iterative becomes competitive might change within FEAST, as the solution to the linear system can be solved inexactly. I am less familiar with this aspect of the algorithms, but for example in the fortran implementation the linear systems are solved only in single precision (I believe by default). I would be very interested in using MUMPS over UMFPACK, though I would prefer not to add it as a dependency until they have a BinaryBuilder package, since, reading the open issues, that might happen soon?

rleegates commented 4 years ago

Well, I agree on the dependency. I have modified the code in feastjl (the other package) to support the passing of a user-defined function for the solver. That gave me a considerable performance boost when using MUMPS in place of UMFPACK. This solves the dependency issue and would also solve it here. I will try single-precision and report back; this is also what I have read in the official feast manual. I also read somewhere that there is a function for subspace-size estimation, do you know anything about that?

spacedome commented 4 years ago

This is the approach taken by fortran FEAST, the user can solve the linear system using a reverse communication interface. I will set it up as a function that can be passed in. If your problems are small enough, you can store the factorizations to be reused after the first iteration, giving a very large performance improvement. This would be harder to make an interface for, I will implement this (stored factorization) for now with UMFPACK, and replace it with MUMPS when the dependency issue is fixed.

The contour integration of FEAST can be used to give a stochastic estimate of the number of eigenvalues in the contour (reference I have on hand). Convergence rate depends heavily on subspace size, having a subspace size roughly twice the number of eigenvalues is generally good. It is also possible to resize the subspace after the first iteration if the inital size is too big/small (at the cost of reallocating memory). I can implement the stochastic estimate, and subspace resizing, they are useful but I hadn't really needed them for what I was doing.

rleegates commented 4 years ago

Storing the factorization is simple in MUMPS3: initialize a mumps object, compute the factorization out-of-core and then (perhaps repeatedly) call mumpssolve! on this object. The factorization should then stay on disk (perhaps distributed) and MUMPS will handle the retrieval during the back-substitution process. Just loop over the integration points and generate OOC-MUMPS objects - that should be all.

The paper you mention would be of great use to me, however it appears as though only the Hermitian generalized eigenvalue problem is treated. Is the process easily adapted to the non-Hermitian generalized problem? By the way, the problem I am attempting to compute is (iA - aB)e with Hermitian positive-definite B and non-Hermitian A. I am thinking of implementing RFEAST or XFEAST, but this also seems to be treated only for the Hermitian case.

spacedome commented 4 years ago

That seems a much better solution than how I had been doing it previously (just an array of factorization objects). It should give a large performance increase, since this is typically the most expensive step in each iteration.

Almost everything can be applied to non-Hermitian problems, I think presenting the Hermitian case is usually done to simplify analysis or the presentation. From the end of the stochastic estimate paper: "the whole technique extends to non-Hermitian case without any difference." Similarly, standard FEAST can sometimes be used effectively on non-Hermitian problems. The non-Hermitian FEAST just additionally iterates dual left/right subspaces, which gives bi-B-orthogonal subspaces and improved stability (both sometimes important, not sure if you need this for your problem).

I am unsure how the expanding subspace approaches of XFEAST and RFEAST compare to the current FEAST approach. The fortran code now is more similar to IFEAST, using the block residual vectors to update the subspace (a residual inverse iteration), and is what allows for solving the linear systems in reduced precision. This approach is not entirely compatible with RFEAST, which uses the block residual vectors to expand, but should be fine with XFEAST. They should both work in the non-Hermitian case.

rleegates commented 4 years ago

Well I guess you'll still hold that array of factorization objects, just that in the case I mentioned, the data is internally stored on disk by MUMPS.

I modified the feastjl code to the point that it has somewhat bearable performance but don't really want to go much further there, since I'm basically waiting on your updates. When I get hold of your refactored code, I'll post some benchmarks. My test problem is currently about 75k x 75k, which, despite its small size is still quite slow in the modified feastjl implementation (pre-factorized, double-precision). Your code looks much better from what I can see, and depending on how my mileage varies with wiring that to MKLSparse.jl and MUMPS3.jl (single-precision) preconditioner + IterativeSolvers.jl, I'd really like to be able to go up to about 4-5M.

Okay, that's really good to know and makes lots of sense. Right now, I'm just warming up to the algorithm settings, so maybe, with those in place, I won't need to go into the newer modifications. The stochastic estimate together with the subspace enlargement that you plan on implementing will really help zeroing in on those settings for each spectrum slice.

spacedome commented 4 years ago

Brendan (who wrote the feastjl code) worked in the same lab as me, I used a bit of that code initially, he probably will not continue working on it now that he has graduated.

I have not particularly optimized my code yet, the fortran code will likely be much faster (I have not written julia bindings to it yet though). I just ran the code on a 10k x 10k sparse problem, it took about 30s. Annoyingly the UMFPACK solvers are not threaded like the LAPACK ones are, which I was using to avoid implementing parallel processing until now. This would be the next step, as the algorithm is very parallel, though I am not sure what the best approach would be, possibly Distributed.jl or MPI.jl. I will try to add some automated benchmarks.

I have added an argument to pass a user defined linear system solve. It should take the same parameters as ldiv!, the default is set to the following. I can modify this if you need, in particular there might be a way to modify it so that you can precompute and store factorizations, probably would just need to pass an index instead of the matrix C.

function lu_solve!(Y, C, X)
    ldiv!(Y, lu(C), X)
end

I have refactored the code to use the IFEAST residual inverse iteration approach, but it does not seem to work with single precision solves at the moment. Storing the factorizations also seems to not be giving the performance increase it should, I need to take another look at both of these.

I'll put subspace estimate & enlargement on the list (#5 ) so I don't forget to add them.

spacedome commented 4 years ago

I just implemented the stochastic estimate, see the file stochastic.jl. This is a very simple implementation, which perhaps will make it easier to modify. Since it is essentially the first part of a FEAST iteration, the factorizations could be stored and then used with FEAST. I have it solve larger block systems once, but it is usually presented as solving many single vector systems and averaging, or something in between.

It will generally give an underestimate, the bias can be reduced by using a more accurate contour (ie more nodes), or in the Hermitian case using a circular contour with a Gauss quadrature instead of trapezoidal, which I have also just implemented. It all has to do with the shape of the rational function and how well it approximates an indicator function for the contour region. Also generally in the Hermitian case (for all of the algorithms) the number of linear system solves can be halved due to symmetry, which I have not implemented yet.

rleegates commented 4 years ago

That looks great. I just went over the code but noticed that the non-Hermitian FEAST with left and right eigenvectors is not quite implemented yet, so I'll wait on that. If you need help, please let me know.

I gave the interface a little update for passing a factorizer function: https://github.com/spacedome/FEASTSolver.jl/pull/8

spacedome commented 4 years ago

The dual left/right non-Hermitian FEAST doubles the number of linear system solves, so if you do not need both subspaces or the additional stability, using the standard FEAST may be better.

I should be able to implement the non-Hermitian algorithm soon, I just spoke with one of the authors about some details. There is a subspace resizing step that I will leave out for now, but it means for too large of a search subspace the reduced system is singular and LAPACK will probably give an error.

Thanks for the pull request!

spacedome commented 4 years ago

I just added a quick implementation of dual_gen_feast! and it seems to more or less work on the test problem I tried (a grcar matrix, standard non-Hermitian, see test/non_hermitian.jl). Almost definitely some bugs, and not at all optimized, trying to get it working reliably first. I will have to spend some time looking more at the fortran code. If you are able to test it on your problem, it would be great to know if it works, as I do not have any generalized non-Hermitian problems.

rleegates commented 4 years ago

Ok, upon a quick look, this didn't seem to work for me. I will try to get you some test matrices. What's the desired format? MAT file would probably be easiest, but I could also write IJV tuples into CSV.

spacedome commented 4 years ago

It didn't quite work right the last time I tried to implement it either, so I probably will have to take another look at the fortran code. That would be great, either MTX or MAT files work.