Relax constraints on merging PER_PART kernels?

LonelyCat124 commented 3 years ago

It could be possible sometimes to relax the constraints that prevents merging PER_PART kernels.

For example we currently forbid merging of any PER_PART kernels that update particle positions or cutoff.

However, it may be possible to combine kernels more leniently than that. If a PER_PART kernel updates particle positions or cutoff, but any future computations are independent of those updates they can still be merged, as there is no relevancy of the particle position or cutoff to whether a computation should take place.

I think I can implement this pretty straightforwardly into the current framework, and this could potentially help with some issues for performance inside DL_MESO (With some other instruction reordering).

LonelyCat124 commented 3 years ago

Ok, this isn't as straightforward as I'd hoped, and I think there should be some code improvements to the invoke framework which seperates calls to is_safe_to_combine based upon the type of kernel incoming.

A prime example of where combining could happen but is not currently supported is in the dl_meso example, for velocity_verlet_stage2 and mdvv_kinetic_energy_compute. The kernels are:

        var rmass = 1.0 / part1.core_part_space.mass
        part1.core_part_space.vel_x = part1.core_part_space.vel_x + 0.5 * config.tstep * ((part1.fxx) * rmass) --TODO NYI: constant force bdfrcx
        part1.core_part_space.vel_y = part1.core_part_space.vel_y + 0.5 * config.tstep * ((part1.fyy) * rmass) --TODO NYI: constant force bdfrcx
        part1.core_part_space.vel_z = part1.core_part_space.vel_z + 0.5 * config.tstep * ((part1.fzz) * rmass) --TODO NYI: constant force bdfrcx

and

        config.mdvv_type.strscxx += part1.core_part_space.mass * part1.core_part_space.vel_x * part1.core_part_space.vel_x
        config.mdvv_type.strscxy += part1.core_part_space.mass * part1.core_part_space.vel_x * part1.core_part_space.vel_y
        config.mdvv_type.strscxz += part1.core_part_space.mass * part1.core_part_space.vel_x * part1.core_part_space.vel_z
        config.mdvv_type.strscyy += part1.core_part_space.mass * part1.core_part_space.vel_y * part1.core_part_space.vel_y
        config.mdvv_type.strscyz += part1.core_part_space.mass * part1.core_part_space.vel_y * part1.core_part_space.vel_z
        config.mdvv_type.strsczz += part1.core_part_space.mass * part1.core_part_space.vel_z * part1.core_part_space.vel_z

Since these operators are applied in-order and on a per-element basis, there is no question they can be combined.

LonelyCat124 commented 3 years ago

The code to check if per_part kernels should be very simple.

It only cares about the config updates (as these are the only "global" operations), all per particle operations will be in-order and are on a per-element basis so have no interference.

stfc / RegentParticleDSL

Relax constraints on merging PER_PART kernels? #94