Faster arr4d boundaries

It looks like prolong_bnd_from_coarser was very inefficient in AMR simulations due to incomplete rank-4 implementation (with workaround through cg%wa). A bit more direct approach (still component-wise) resulted in factor of 2 speedup of a 3D sedov problem with 3 levels of refinemet.

There is another big factor to gain in optimizing the way of calculating fine guardcells over coarse boundary. Some easy gains were done in eaf63af but real improvement will be easier to achieve with MPI-2 communication.

piernik-dev / piernik

Faster arr4d boundaries #406