zer011b / fdtd3d

fdtd3d is an open source 1D, 2D, 3D FDTD electromagnetics solver with MPI, OpenMP and CUDA support for x64, ARM, ARM64, RISC-V, PowerPC architectures
GNU General Public License v2.0
115 stars 33 forks source link

I have some questions #198

Open YouthTIAN opened 2 months ago

YouthTIAN commented 2 months ago

Thank you for this great software, I have some questions:

  1. Does this code include the pre-processing and post-processing parts of FDTD, and if so, is it carried out on cpu or gpu?
  2. Is the accuracy guaranteed and has it been compared with other commercial software?
  3. Does PML also apply parallel grid and dynamic grid?

Thank you

zer011b commented 2 months ago
  1. Please specify what pre/post processing you mean. In general, the current approach is to pre/post process data using some third-party tools (for example bash scripts, i.e on cpu), and for this purpose fdtd3d supports input/output in text format, which can be easily read and modified. I suppose some tools can do proccessing on gpu too.
  2. You can find tests in https://github.com/zer011b/fdtd3d/tree/master/Tests/suite. They test different scenarios and compare numerical solutions with analytical solutions, for example https://github.com/zer011b/fdtd3d/blob/master/Tests/suite/t7.1/README.md. Accuracy should be guaranteed, however if you find any issues, please share them. I also didn't do any precise comparison with commercial solution thus far.
  3. Parallel/Dynamic grid is orthogonal to PML, this is just a parallelization mechanism that doesn't affect overall result, but improves simulation time. Parallel grid splits Yee grid points between multiple nodes (i.e. processes), so that some nodes might have PML grid points, some might not have them. This applies to all fields and other arrays that correspond to Yee grid points. So, yes, you can run mpi parallelized simulations with PML. Dynamic grid is further step in this directions, that can modify amount of grid points assigned to node during computations based on performance of this node. However, it's not finished and not supported right now. You can find more details about both in papers that are mentioned at the end of the main README in the repository.
YouthTIAN commented 1 month ago

Thank you very much! Another question is how about the degree of parallelism and parallel efficiency of this software using cpu/gpu?

zer011b commented 1 month ago

Currently there're three different modes of parallelism supported: mpi, cuda, mpi+cuda. All of these have can have different parallelism characteristics, which can also be affected by different devices, OS and compilers, and so on. And parameters of simulation can also have significant effect, like grid size, usage of different modes, sizes of parallel buffers, and so on. So, it's better to test on your device of interest. For some simple scenarios, here's fdtd3d benchmark page: https://zer011b.github.io/fdtd3d/.

YouthTIAN commented 1 month ago

Thank you. Moreover, I would like to know which kind of Virtual topology of grid (x, y, z, xy, xz, yz, xyz) through all computational nodes has the best efficiency for 3D model on the supercomputer. If I only assign x, y or z, do they have obvious difference in this case? I guess xyz should be better but I'm not sure whether the more complicated connection and communication between neighboring nodes will decrease the efficiency instead. Besides, we know that the calculation for the nodes with PML layer should be larger. Will this infect the efficiency heavily? Thanks.

zer011b commented 1 month ago

I would like to know which kind of Virtual topology of grid (x, y, z, xy, xz, yz, xyz) through all computational nodes has the best efficiency for 3D model on the supercomputer.

This question doesn't have a simple answer.

First of all, it depends on actual simulation area, consider next simulation examples:

But this is not all, because virtual topology also depends on characteristics of target device. For example:

For simulation examples above in case of 2 nodes you can split just one axis, so it'll be x=5000, y=10, z=10 and x=500, y=1000, z=1000 for both nodes. In case of 8 nodes it can be x=1250, y=10, z=10 and x=500, y=500, z=500 for each node.

Besides, ideally virtual topology should somehow match the physical one, because otherwise data sharing might become ineffective (for example, data might be sent through multiple transit nodes).

For homogeneous architectures there's an automatic virtual topology selection built-in in fdtd3d (see log at the beginning of simulation), but it has some requirements:

If any of those is not met, fdtd3d might not select optimal virtual topology automatically (but you can always set virtual topology manually). For heterogeneous architectures, there's dynamic grid planned, which will be able to automatically adjust amount of computations on each node (not finished yet, only manual setup of virtual topology is available). You can find more details about how it selects virtual toplogy either in source code, or in papers that are mentioned at the end of the main README in the repository.

Besides, we know that the calculation for the nodes with PML layer should be larger. Will this infect the efficiency heavily?

Yes, potentially this might affect optimal virtual topology, but I think it's not the case for now with PML, because non-pml and pml grid points have same amount of computations. Yet, this is the case for TFSF. Anyway, dynamic grid is planned for such cases, but now only manual setup of topology is available.

So, to sum it, you you need to test it out on your simulation and on your device.

YouthTIAN commented 1 week ago

Thank you very much. I tested 200200200/100010001000 model on my homogeneous supercomputer. However, I found that when I set the virtual topology as x > y > z rather than x=y=z, the efficiency is the best. Is it normal? Besides, I would like to know whether there are some papers about the study of virtual topology, especially some mathematical formulas about that. For example, in this paper https://ieeexplore.ieee.org/document/1606757, the conclusion is "As to the same dimensional virtual topology, the topology scheme should be created along the directions where the amount of the FDTD grids is larger." Does it make sense? Thank you!

zer011b commented 1 week ago

However, I found that when I set the virtual topology as x > y > z rather than x=y=z, the efficiency is the best. Is it normal?

As I mentioned in https://github.com/zer011b/fdtd3d/issues/198#issuecomment-2151646962, it depends on actual device that you use. Even if all nodes have the same performance, virtual topology should match physical one (actual connections between nodes), because communication speed between nodes can be different.

I would like to know whether there are some papers about the study of virtual topology, especially some mathematical formulas about that

You can check papers mentioned in README (https://github.com/zer011b/fdtd3d?tab=readme-ov-file#how-to-cite), there's mathematical analysis of best virtual topology. And fdtd3d uses the same logic in code to identify it.

"As to the same dimensional virtual topology, the topology scheme should be created along the directions where the amount of the FDTD grids is larger." Does it make sense?

It's true, but lacks details of when this becomes true/false. For example, for 2d grid x=10000, y=10 it's clear that x axis should be divided between both 10 nodes, or 100 nodes. However, this doesn't describe what to do in x=1200, y=600 case with 6 nodes. Should it be x=200,y=600 for each node, or should it be x=400,y=300 for each node? The papers I've mentioned above describe how to choose between these.