[JOSS review] Review of the paper

thelfer commented 5 months ago

does not require a discretization of the free space.

This sentence is unclear (to me). Techniques like FEM has the same "advantage", no ?

Section Solver Performance

The performance statements are pure CPUs. Is GPU's usage transparent ? What about scalability ? To what extent is the FFT part the dominant part of the computation (80%, 90%, 99%) ?

otvam commented 5 months ago

This sentence is unclear (to me). Techniques like FEM has the same "advantage", no ?

For electromagnetic problems (I know this it not the case for other types of equations), the free-space (e.g. air, vacuum) has a critical contribution to the solution as the electromagnetic fields can propagate and store energy in the vacuum.

For differential equation based methods (such as FEM), this is handled by having a large bounding box around the magnetic component. This means that the volume that is discretized is much larger than the component itself.

For integral equation based methods (such as PEEC), the partial elements (inductance and potential matrix) are already considering an infinitely large bounding box. This implies that the free-space does not need to be discretized.

Section Solver Performance

This is a very interesting and complex question and the answer is a bit "it depends"...

For static problems without magnetic materials (very special case), the sparse preconditioner is already solving the problem (in this case the FFT is only used for post-processing the field). I am observing the following split for computational time (without GPUs):

85% to 95% => non-FFT
5% to 15% => FFT

For non-static problems without magnetic materials, the computational cost of the sparse preconditioner is not negligible. I am observing the following split for computational time (without GPUs):

5% to 30% => non-FFT
70% to 95% => FFT

For problems with magnetic materials, the sparse preconditioner is less effective and the FFT is doing the lion's share of the job. I am observing the following split for computational time (without GPUs):

1% to 5% => non-FFT
95% to 99% => FFT

It can be concluded that the FFT acceleration has a critical impact. However, a complete analysis of the computational cost would extend the paper by several pages and would be out of scope for JOSS (original research contribution).

The simple benchmark problem in the paper is mainly included to give a rough idea of the scalability of the solver. The choice of providing this benchmark without GPU has been done for the following reasons:

This benchmark is on the conservative side (would be even better with GPU)
The benchmark is simpler (no mixing between CPU and GPU performance)
CUDA is not open source (and therefore not installed with the default PyPEEC package)

The GPU usage is transparent (done through the CuPy library). The only thing the user has to do is activate the GPU usage in the tolerance.yaml file (given that CUDA is properly installed).

I clarified in the paper that the benchmark is done with CPU and without GPU.

thelfer commented 5 months ago

For integral equation based methods (such as PEEC), the partial elements (inductance and potential matrix) are already considering an infinitely large bounding box. This implies that the free-space does not need to be discretized.

Thanks for the clarification

The GPU usage is transparent (done through the CuPy library). The only thing the user has to do is activate the GPU usage in the tolerance.yaml file (given that CUDA is properly installed).

This is not described in the installation guide. It shall be as the package claims to be GPU ready, IMHO.

thelfer commented 5 months ago

I clarified in the paper that the benchmark is done with CPU and without GPU.

I think that this was clear enough in the paper, but claiming GPU support and showing CPU benchmarks are a bit disapointing. See below for a way to alleviate this to a large extent.

For problems with magnetic materials, the sparse preconditioner is less effective and the FFT is doing the lion's share of the job. I am observing the following split for computational time (without GPUs):

1% to 5% => non-FFT

95% to 99% => FFT

This statement can be incorporated very easily in the paper. The following sentence is quite straightforward IMHO:

"For problems with magnetic materials, we observed that the FFT resolution may take the vast majority of the computational effort, typically more than 95% in some cases. In those cases, the performances of PyPEEC are directly driven by the FFT solver."

In the Solver Performance section, evaluating the time passed in the FFT Solver would help. Indeed, if this example is such that the FFT solver is dominant, then the performance claims does not even have to be discussed much, IMHO.

If this is not the case, a user not expert in the field (which I am) can't say much about this section.

Also, if 99% of the time is spend in the FFT solver, it clearly indicates that GPU support is straighforward and will be efficient.

otvam commented 5 months ago

This is not described in the installation guide. It shall be as the package claims to be GPU ready, IMHO.

Yes, this is a good point. This is now fixed with the commit 3268a15a6d352d9328e0d95efd694365de826c78. The GPU (and other numerical options) are now described in the documentation.

otvam commented 5 months ago

A paragraph on the FFT impact on the performance has been added to the paper: bf3848d0646d54486d4499a2e3e814930beaec06.

Part of the problem is that the GPU rig I am using is completely outdated (K80 GPU from 2014 with a CPU from 2013). Even with the old GPU, the FFTs are faster than on a modern CPU but the non-GPU operations (sparse preconditioner) are slowing the rest of the code. So, using this hardware for a benchmark would be quite pointless. I have hopes to get access to something better soon. If this happen while the paper is still in review, I will add it to the benchmark.

thelfer commented 5 months ago

The paragraph is fair IMHO

otvam / pypeec

[JOSS review] Review of the paper #8