Writing the coordination with ArrayFire

I'm not opening a new PR, because I do not think that the work is done, but I think that it is worth showing how we can use ArrayFire for writing the coordination. You can see the code here.

This is heavily based on what was already done in SAXS and on what I did for the CudaCoordination. There is some code repetition taken from CudaCoordination that I used to ease the transfer of data.

The main advantage against Cuda is that Plumed already "knows" about ArrayFire within its ./configure, so is easier to start working with AF.

The main difference against plain Cuda is that with Cuda you "do not have" tools: you craft your own. And those tools are optimized for your own problem. With ArrayFire you have a small toolset optimized for doing tensor (up to 4D) calculations, so you have to adapt your problem in "tensors" (and you have to fully embrace the philosophy "If the only tool you have is a hammer, you tend to see every problem as a nail."[cit.]).

I measured with the current plumed benchmark using calling the actions with cv: *** GROUPA=@mdatoms GROUPB=@mdatoms R_0=1, using the single precision version for running with my local workstation GPU and using the cuda implementation of ArrayFire. Benchmarks are ran with plumed benchmark --plumed="plumed.dat:cudasingleplumed.dat:firesingleplumed.dat" --natoms=${natoms} --nsteps=1000

This is the raw time of 4 Calculating (forward loop) only:

And this is the raw time against the base COORDINATION:

I did not manage to get the same performance boost I got with the plain Cuda implementation, but I think that sharing this may be useful as a starting point.

plumed / plumed2

Writing the coordination with ArrayFire #1049