Invoke syntax - Githubissues

LonelyCat124 commented 3 years ago

I'm working on the invoke syntax at the moment.

My current thoughts are:

Invoke call takes a set of input pairs of kernel function & type of kernel, i.e. invoke({kernel_one, PAIRWISE}, {kernel_two, PER_PART}, ...).
By default, the end of each invoke call has a barrier. There will be an optional argument that can be passed to disable the barrier.
At the moment the coherence modes are just the default. I think this probably will need changing soon, and options to the invoke call should be available.

LonelyCat124 commented 3 years ago

Initial implementation of this is in the commit abb0afb , using the implementation above.

I guess discussion points are:

Should a barrier be there by default? This can be a per-neighbour search thing anyway.
What is the best coherence mode for the default? How is best to change this using quotes?

LonelyCat124 commented 3 years ago

A thought I had is:

Tasks use simultaneous coherence.
Default is barrier on at the end of an invoke call.
Kernels inside the same invoke call can be reordered by the runtime.

I think this is the best default behaviour i can think of.

LonelyCat124 commented 3 years ago

Further to the previous - Tasks that update the position or cutoff use exclusive coherence.

LonelyCat124 commented 3 years ago

Ok, so the coherence setting is pretty easy:

local coherences = terralib.newlist()
if update_neighbours then
  coherences:insert( regentlib.coherence( regentlib.exclusive, parts1 ) )
  coherences:insert( regentlib.coherence( regentlib.exclusive, parts2 ) )
else
  coherences:insert( regentlib.coherence( regentlib.atomic, parts1 ) )
  coherences:insert( regentlib.coherence( regentlib.atomic, parts2 ) )
end

So the plan for now will be:

The invoke function is declared as:

function invoke( config, ...)

In the ... arguments we search for the NO_BARRIER option. The rest we look for {kernel_function, TYPE_OF_KERNEL} tables. Any input arguments that are not these are currently ignored. We could make this throw an error but I don't think it matters?

For each kernel in the list (in the order they are passed) we do the following: Compute the privileges for the kernels, and check if they write to position or cutoff variables.

If not, and the kernel is the same type as the previous input, we add it to the list of kernels to place into the next task.
If not, and the kernel is a different type to the previous input, we create the task for the list of kernels currently saved, clear the list of kernels and add this kernel to the list.
If it does, then create a task for the list of kernels currently saved, clear the list, and create a task for this kernel alone.

Edit: I don't know for sure that combining kernels is a good idea for performance so I'll probably also add an input keyword to disable kernel merging. Edit 2: For asymmetric tasks (read-only for parts2) we can allow atomic coherence always for the read-only region I think.

LonelyCat124 commented 3 years ago

Ok - I believe I have an implementation of this in ad2523d now working correctly according to what has been discussed in this thread.

LonelyCat124 commented 3 years ago

Ok, so at the moment the combining of kernels ignore certain dependencies I believe. We need to disable WaW and RaW dependencies. At the moment if we have 2 kernels

part1.x = part1.x + 1

part1.y = part2.x + 1

part1.y is almost certainly not the same as if the tasks are done one after the other.

What we need to do is have the safe_to_combine function take in the current list of kernels, and dissallow combining if the current kernel reads or writes to a particle member that was previously written to by any of the combined kernel list.

LonelyCat124 commented 3 years ago

Fix for this in the ISPH branch (93214e9) - this is now slightly conservative as it treats a task that can't be combined due to RaW/WaW/WaR dependency the same as a task that can't be combined as it invalidates the neighbour search.

This means it has a task by itself, instead of attempting to combine it with any tasks after. Will remove the bug label once this is merged to master.

Edit: Also notably the previous implementation would be correct for per-particle tasks as these don't have pairwise dependencies invalidating just doing things in the same order.

LonelyCat124 commented 3 years ago

Coherences being atomic is not taking into account ordering right now which is an issue. I need to think how to enforce ordering for things like task a happening before task b, but allowing any of task a to happen in any order.

This could be done through empty "in-between" tasks, or through barriers, but I believe this is not correct behaviour right now.

LonelyCat124 commented 3 years ago

For now I've switched back to exclusive coherence - it would be nice to disable this for the same "types" of task - alternatively we can go back to attempting to detect reduction operations.

LonelyCat124 commented 3 years ago

Ok - so the latest commit now allows the program to detect which kernels can be combined but not with the current set, and which kernels can never be combined and should treat them appropriately. Removing the bug tag as of here.

LonelyCat124 commented 3 years ago

Ok - this now works as expected, I'm going to close this issue - future additions to invoke framework will be put in their own issues.

stfc / RegentParticleDSL

Invoke syntax #62