Open LonelyCat124 opened 3 years ago
Ok, this isn't as straightforward as I'd hoped, and I think there should be some code improvements to the invoke framework which seperates calls to is_safe_to_combine
based upon the type of kernel incoming.
A prime example of where combining could happen but is not currently supported is in the dl_meso example, for velocity_verlet_stage2
and mdvv_kinetic_energy_compute
. The kernels are:
var rmass = 1.0 / part1.core_part_space.mass
part1.core_part_space.vel_x = part1.core_part_space.vel_x + 0.5 * config.tstep * ((part1.fxx) * rmass) --TODO NYI: constant force bdfrcx
part1.core_part_space.vel_y = part1.core_part_space.vel_y + 0.5 * config.tstep * ((part1.fyy) * rmass) --TODO NYI: constant force bdfrcx
part1.core_part_space.vel_z = part1.core_part_space.vel_z + 0.5 * config.tstep * ((part1.fzz) * rmass) --TODO NYI: constant force bdfrcx
and
config.mdvv_type.strscxx += part1.core_part_space.mass * part1.core_part_space.vel_x * part1.core_part_space.vel_x
config.mdvv_type.strscxy += part1.core_part_space.mass * part1.core_part_space.vel_x * part1.core_part_space.vel_y
config.mdvv_type.strscxz += part1.core_part_space.mass * part1.core_part_space.vel_x * part1.core_part_space.vel_z
config.mdvv_type.strscyy += part1.core_part_space.mass * part1.core_part_space.vel_y * part1.core_part_space.vel_y
config.mdvv_type.strscyz += part1.core_part_space.mass * part1.core_part_space.vel_y * part1.core_part_space.vel_z
config.mdvv_type.strsczz += part1.core_part_space.mass * part1.core_part_space.vel_z * part1.core_part_space.vel_z
Since these operators are applied in-order and on a per-element basis, there is no question they can be combined.
The code to check if per_part kernels should be very simple.
It only cares about the config updates (as these are the only "global" operations), all per particle operations will be in-order and are on a per-element basis so have no interference.
It could be possible sometimes to relax the constraints that prevents merging PER_PART kernels.
For example we currently forbid merging of any PER_PART kernels that update particle positions or cutoff.
However, it may be possible to combine kernels more leniently than that. If a PER_PART kernel updates particle positions or cutoff, but any future computations are independent of those updates they can still be merged, as there is no relevancy of the particle position or cutoff to whether a computation should take place.
I think I can implement this pretty straightforwardly into the current framework, and this could potentially help with some issues for performance inside DL_MESO (With some other instruction reordering).