Poisson solver - Githubissues

semi-h commented 7 months ago

Poisson solver is the final part we need for running simulations with x3d2. The plan is to have an FFT based direct solver and use it whenever we can, and also have a CG based iterative solver. So I came up with this structure to have both methdos and decide which one to run at run time.

Few important considerations so that the structure makes more sense:

FFT based solver is implemented in a dedicated class/module
- This will make it easy to switch to 2DECOMP&FFT later on
- Current pencil arrangement is already compatible with 2DECOMP&FFT
FFT based poisson solver class is instantiated by the backend.
- Different backends will use different libraries for running the FFT, I think this makes everything easy.
- Alternative would be instantiating the poisson_fft class in solver, but I think it would make things uglier.
A matrix free CG based solver requires to evaluate a Laplacian operation.
- I think it is a good idea to implement a laplacian subroutine in the solver class. We might require this in the solver later on just like gradient and divergence operations.
- So a CG solver will basically call a laplacian subroutine in solver class, and few other stuff provided by backends. This is why it doesn't require its own class like the fft based solver.
- @pbartholomew08, I actually focused on a standard CG method, can you comment on a preconditioned CG method and how similar or different it is compared to a standard CG implementation?

Closes #55

pbartholomew08 commented 7 months ago

To get good performance (at scale) you need a preconditioner. The preconditioner is typically based on an actual matrix, e.g. a low order discretisation. I'm working on an implementation of this here https://github.com/3decomp/poissbox/pull/20, once I've got the preconditioner sorted I'll port it over to x3d2. The high-order Laplacian is working and a star-based preconditioner is implemented, but it's not currently effective.

I don't think we want to implement CG ourselves though!

Note that the "serial" aspect of the linked PR is only due to the TDMA implementation I'm using to keep things simple, the CG and AMG preconditioner are parallel, and the solver can be run using the low-order discretisation in parallel.

pbartholomew08 commented 7 months ago

* A matrix free CG based solver requires to evaluate a Laplacian operation.

  * I think it is a good idea to implement a laplacian subroutine in the solver class. We might require this in the solver later on just like gradient and divergence operations.

For consistency of the discretisation the Laplacian operator should be built by combining the div and grad operators.

JamieJQuinn commented 7 months ago

The overall structure looks good. I like the simple calling code in the solve. Seems unlikely that the CG solver won't require some backend-specific stuff (so might need its own class) but the generic procedure can handle that case well, and can extend to other options if needed.

semi-h commented 7 months ago

I don't think we want to implement CG ourselves though!

At the current stage the codebase is only able to pick between two dummy functions, poisson_fft or poisson_cg at runtime via the function pointers. I think poisson_cg can then issue more sophisticated calls to PETSc for example. If you think this looks good I'm happy to leave poisson_cg as is so that you can port your stuff when ready. My plan was to complete only the fft solver in this PR so that we can test the codebase in full, but just wanted to implement a way to pick between two methods we'll definitely have in the codebase soon.

semi-h commented 7 months ago

For consistency of the discretisation the Laplacian operator should be built by combining the div and grad operators.

This is very interesting, I wasn't expecting this at all. But I think the gradient and divergence subroutines in solver is not the exact ones you need, right? Because they switch between velocity grid and pressure grid. I think the iterative solver will require both gradient and divergence starting from the pressure grid and ending in the pressure grid. If so we'll need a new set of tdsops to support this operation and add two new grad and div subroutines later on.

semi-h commented 7 months ago

The overall structure looks good. I like the simple calling code in the solve. Seems unlikely that the CG solver won't require some backend-specific stuff (so might need its own class) but the generic procedure can handle that case well, and can extend to other options if needed.

Actually we can access the backend stuff from the solver, all methods in solver do it already. For example a naive CG implementation would only require a laplacian and vector squares I guess. And technically we could add the FFT related stuff into the backends and not require seperate class as well, but I didn't do it for two reasons.

FFT based solver requires lots of parameters, settings, complex arrays..., it would make the backends complicated. Because it comes with bunch of complicated stuff that only relates to an FFT based approach, it made sense to me to separate it into its own class and keep the complexity away from backend. (all this is not in the PR yet)
If we switch to a third-party library (2DECOMP&FFT), then we obviously can't call things from our backend, and all those stuff would have to be removed in case of a switch, and also having a separate class now should help later on when switching I think.

But if we're doing the CG stuff via PETSc, then its very likely that we'll need a class to keep PETSc related complexity away from backends too.

pbartholomew08 commented 7 months ago

For consistency of the discretisation the Laplacian operator should be built by combining the div and grad operators.

This is very interesting, I wasn't expecting this at all. But I think the gradient and divergence subroutines in solver is not the exact ones you need, right? Because they switch between velocity grid and pressure grid. I think the iterative solver will require both gradient and divergence starting from the pressure grid and ending in the pressure grid. If so we'll need a new set of tdsops to support this operation and add two new grad and div subroutines later on.

The pattern of execution is p (p grid) -> grad p (v grid) -> lapl p (p grid), because then this is consistent with the use of grad(p) in the momentum equations and the evaluation of div(u). I'll try just evaluating the Laplacian directly using 2nd derivatives, maybe this will work better.

semi-h commented 6 months ago

The only remaining part is the reorder functions operate on complex arrays. I think I can get them working tomorrow.

JamieJQuinn commented 6 months ago

This fixes #55

semi-h commented 6 months ago

Just to let you know when reviewing, all the reorder and reshape functions in complex.f90 will be removed when we switch to cuFFTMp. Also, the plans in the init function will be adopted to cuFFTMp, and the fft_forward and fft_backward functions will be simplified down to single line calls from cuFFTMp.

Only thing I want to add before we merge is a periodic call to curl and divergence functions to report enstrophy and divergence of velocity field as the simulation goes on. We need a simple inner product function in the backends to finalise that.

JamieJQuinn commented 6 months ago

Just to let you know when reviewing, all the reorder and reshape functions in complex.f90 will be removed when we switch to cuFFTMp.

Can you mark these with a comment like "TODO remove after cuFFTMp" and create an issue for it?

semi-h commented 6 months ago

A call for last round of review folks!

The input settings in the main file is arranged so that it produces the result I shared in the slack channel. Basically a TGV from 0 to 20 on a 256^3 grid.

The output subroutine is not very good but for the time being useful to see if there is anything going bad with the TGV, because it prints the enstrophy. I suppose this will evolve a lot into more useful output stuff later.

Just to let you know when reviewing, all the reorder and reshape functions in complex.f90 will be removed when we switch to cuFFTMp.

Can you mark these with a comment like "TODO remove after cuFFTMp" and create an issue for it?

They won't be being called once the cuFFTMp stuff is in and then like any redundant function they'll need to be removed. However we might want to compare performance of the two approaches so for a short while both the current FFT solver and cuFFTMp may coexist too. I think it'll be more clear with the cuFFTMp PR so not going ahead with TODO comments for now.

pbartholomew08 commented 6 months ago

For consistency of the discretisation the Laplacian operator should be built by combining the div and grad operators.

This is very interesting, I wasn't expecting this at all. But I think the gradient and divergence subroutines in solver is not the exact ones you need, right? Because they switch between velocity grid and pressure grid. I think the iterative solver will require both gradient and divergence starting from the pressure grid and ending in the pressure grid. If so we'll need a new set of tdsops to support this operation and add two new grad and div subroutines later on.

The pattern of execution is p (p grid) -> grad p (v grid) -> lapl p (p grid), because then this is consistent with the use of grad(p) in the momentum equations and the evaluation of div(u). I'll try just evaluating the Laplacian directly using 2nd derivatives, maybe this will work better.

Adding for reference (I think we've discussed this in meetings) the direct evaluation of the Laplacian using high-order methods for 2nd derivatives on the pressure grid seems to work well (as a standalone solver).

semi-h commented 6 months ago

Last commit fixed most issues raised.

There are a few comments about the reshape and complex_reorder functions, I admit that their naming scheme is terrible and there is a lot to do make them better, but there is a very good chance that cuFFTMp will work fine and they'll all be removed from the codebase entirely very soon. Because we expect their lifespan to be very short, I don't want to invest time to think about a proper naming scheme, adding comments, change the way we set threads/blocks for calling them, or testing them and willing to merge the PR as is. If we decide to keep them longer for any reason I'll be happy to implement all the suggestions.

semi-h commented 6 months ago

Its a good idea to have a unit test for the Poisson solver, but I don't want to implement it here as the PR is already large. Opened an issue to deal with it later on #74.

xcompact3d / x3d2

Poisson solver #32