lammps / lammps

Public development project of the LAMMPS MD software package
https://www.lammps.org
GNU General Public License v2.0
2.19k stars 1.7k forks source link

Coul/slater GPU support #3918

Closed Eddy-Barraud closed 9 months ago

Eddy-Barraud commented 1 year ago

Summary

Simulating accurately charged particles in DPD simulations involves the use of a smeared charge (Slater type) that is available in the pair style coul/slater. [Minerva González-Melchor J. Chem. Phys. 125, 224107 (2006) https://doi.org/10.1063/1.2400223] However, this pair style is not yet accelerated by the GPU.

Detailed Description

It would be great to have the /gpu acceleration on this style to accelerate charged DPD simulations. Currently, I am using a hybrid overlay pair style to add these electrostatic interactions on top of the dpd forces. Here is an example:

pair_style    hybrid/overlay dpd 1 1 ${seed} coul/slater/long 0.25 3
kspace_style    pppm       5e-04
pair_coeff 1 2  coul/slater/long
pair_coeff 1 2  dpd 78.000  4.500

The problem here is that the pair style coul/slater/long is calculating coulombic interactions in the CPU with less efficiency than the GPU accelerated dpd style, slowing the entire simulation.

Example NaCl input code

This example gives me these performances:

Performance: 127587.608 tau/day, 147.671 timesteps/s
81.9% CPU use with 18 MPI tasks x 1 OpenMP threads

MPI task timing breakdown:
Section |  min time  |  avg time  |  max time  |%varavg| %total
---------------------------------------------------------------
Pair    | 10.146     | 13.2       | 16.399     |  44.0 | 19.60
Bond    | 0.00044933 | 0.0005119  | 0.000608   |   0.0 |  0.00
Kspace  | 42.707     | 42.773     | 42.837     |   0.5 | 63.51
Neigh   | 1.436      | 1.4621     | 1.5116     |   1.5 |  2.17
Comm    | 3.6228     | 5.1928     | 6.6668     |  33.2 |  7.71
Output  | 0.062657   | 0.098654   | 0.12243    |   5.1 |  0.15
Modify  | 0.032233   | 0.034493   | 0.045962   |   1.7 |  0.05
Other   |            | 4.591      |            |       |  6.82

Nlocal:        333.333 ave         342 max         323 min
Histogram: 2 0 0 3 1 5 3 2 0 2
Nghost:        1581.22 ave        1624 max        1546 min
Histogram: 2 0 2 3 4 3 2 1 0 1
Neighs:              0 ave           0 max           0 min
Histogram: 18 0 0 0 0 0 0 0 0 0
FullNghs:        21017 ave       21789 max       20157 min
Histogram: 1 1 2 2 3 3 1 2 1 2

Total # of neighbors = 378306
Ave neighs/atom = 63.051
Ave special neighs/atom = 0
Neighbor list builds = 1097
Dangerous builds = 0

---------------------------------------------------------------------
      Device Time Info (average): 
---------------------------------------------------------------------
Data Transfer:   1.9874 s.
Neighbor copy:   0.6474 s.
Neighbor unpack: 0.0000 s.
Force calc:      9.8804 s.
Device Overhead: 12.2551 s.
Average split:   1.0000.
Lanes / atom:    4.
Vector width:    32.
Max Mem / Proc:  1.22 MB.
CPU Cast/Pack:   0.1797 s.
CPU Driver_Time: 0.1045 s.
CPU Idle_Time:   0.0296 s.
---------------------------------------------------------------------

---------------------------------------------------------------------
    Device Time Info (average) for kspace: 
---------------------------------------------------------------------
Data Out:        0.1658 s.
Data In:         1.8316 s.
Kernel (map):    17.4533 s.
Kernel (rho):    0.2267 s.
Force interp:    9.9578 s.
Total rho:       17.8457 s.
Total interp:    11.7894 s.
Force copy:      0.1334 s.
Total:           29.7686 s.
CPU Poisson:     12.3615 s.
CPU Data Cast:   0.0110 s.
CPU Idle Time:   27.8582 s.
Max Mem / Proc:  1.30 MB.
---------------------------------------------------------------------

job.log

On big systems, using the GPU becomes not advantageous.

akohlmey commented 1 year ago

Instead of using hybrid/overlay it would probably be best to have a dpd/coul/slater/gpu pair style in addition to just coul/slater/gpu.

Eddy-Barraud commented 9 months ago

hello @ndtrung81 do you have any update/progress on this task? I will soon model huge polyelectrolytes and this improvement would be beneficial for me. Tell me if you want me to code something in CUDA, I am not an expert though.

ndtrung81 commented 9 months ago

@Eddy-Barraud I have implemented the GPU version of coul/slater/long in PR #4009.

Eddy-Barraud commented 9 months ago

@Eddy-Barraud I have implemented the GPU version of coul/slater/long in PR #4009.

That is awesome! thank you 😁