Open YouthTIAN opened 2 months ago
Thank you very much! Another question is how about the degree of parallelism and parallel efficiency of this software using cpu/gpu?
Currently there're three different modes of parallelism supported: mpi, cuda, mpi+cuda. All of these have can have different parallelism characteristics, which can also be affected by different devices, OS and compilers, and so on. And parameters of simulation can also have significant effect, like grid size, usage of different modes, sizes of parallel buffers, and so on. So, it's better to test on your device of interest. For some simple scenarios, here's fdtd3d benchmark page: https://zer011b.github.io/fdtd3d/.
Thank you. Moreover, I would like to know which kind of Virtual topology of grid (x, y, z, xy, xz, yz, xyz) through all computational nodes has the best efficiency for 3D model on the supercomputer. If I only assign x, y or z, do they have obvious difference in this case? I guess xyz should be better but I'm not sure whether the more complicated connection and communication between neighboring nodes will decrease the efficiency instead. Besides, we know that the calculation for the nodes with PML layer should be larger. Will this infect the efficiency heavily? Thanks.
I would like to know which kind of Virtual topology of grid (x, y, z, xy, xz, yz, xyz) through all computational nodes has the best efficiency for 3D model on the supercomputer.
This question doesn't have a simple answer.
First of all, it depends on actual simulation area, consider next simulation examples:
x=10000, y=10, z=10
, in this case it (probably) makes sense to split just x
axis to reduce amount of sent data and maximize performance.x=1000, y=1000, z=1000
, in this case it (probably) makes sense to split all three to reduce amount of sent data and maximize performance.But this is not all, because virtual topology also depends on characteristics of target device. For example:
For simulation examples above in case of 2 nodes you can split just one axis, so it'll be x=5000, y=10, z=10
and x=500, y=1000, z=1000
for both nodes. In case of 8 nodes it can be x=1250, y=10, z=10
and x=500, y=500, z=500
for each node.
Besides, ideally virtual topology should somehow match the physical one, because otherwise data sharing might become ineffective (for example, data might be sent through multiple transit nodes).
For homogeneous architectures there's an automatic virtual topology selection built-in in fdtd3d
(see log at the beginning of simulation), but it has some requirements:
If any of those is not met, fdtd3d
might not select optimal virtual topology automatically (but you can always set virtual topology manually). For heterogeneous architectures, there's dynamic grid planned, which will be able to automatically adjust amount of computations on each node (not finished yet, only manual setup of virtual topology is available). You can find more details about how it selects virtual toplogy either in source code, or in papers that are mentioned at the end of the main README in the repository.
Besides, we know that the calculation for the nodes with PML layer should be larger. Will this infect the efficiency heavily?
Yes, potentially this might affect optimal virtual topology, but I think it's not the case for now with PML, because non-pml and pml grid points have same amount of computations. Yet, this is the case for TFSF. Anyway, dynamic grid is planned for such cases, but now only manual setup of topology is available.
So, to sum it, you you need to test it out on your simulation and on your device.
Thank you very much. I tested 200200200/100010001000 model on my homogeneous supercomputer. However, I found that when I set the virtual topology as x > y > z rather than x=y=z, the efficiency is the best. Is it normal? Besides, I would like to know whether there are some papers about the study of virtual topology, especially some mathematical formulas about that. For example, in this paper https://ieeexplore.ieee.org/document/1606757, the conclusion is "As to the same dimensional virtual topology, the topology scheme should be created along the directions where the amount of the FDTD grids is larger." Does it make sense? Thank you!
However, I found that when I set the virtual topology as x > y > z rather than x=y=z, the efficiency is the best. Is it normal?
As I mentioned in https://github.com/zer011b/fdtd3d/issues/198#issuecomment-2151646962, it depends on actual device that you use. Even if all nodes have the same performance, virtual topology should match physical one (actual connections between nodes), because communication speed between nodes can be different.
I would like to know whether there are some papers about the study of virtual topology, especially some mathematical formulas about that
You can check papers mentioned in README (https://github.com/zer011b/fdtd3d?tab=readme-ov-file#how-to-cite), there's mathematical analysis of best virtual topology. And fdtd3d uses the same logic in code to identify it.
"As to the same dimensional virtual topology, the topology scheme should be created along the directions where the amount of the FDTD grids is larger." Does it make sense?
It's true, but lacks details of when this becomes true/false. For example, for 2d grid x=10000, y=10 it's clear that x axis should be divided between both 10 nodes, or 100 nodes. However, this doesn't describe what to do in x=1200, y=600 case with 6 nodes. Should it be x=200,y=600 for each node, or should it be x=400,y=300 for each node? The papers I've mentioned above describe how to choose between these.
Thank you for this great software, I have some questions:
Thank you