Add capabilities to solve with mixed precision. In particular, gather and scatter operations can additionally perform averaging, sum, difference operations before copying.
Additionally, some minor improvements are also made:
Remove the dependency on boost (Add the get_mpi_type function internallly).
Fix the benchmarking documentation.
Use executor copies instead of CudaMemcpy's
Move gather and scatter to separate classes and namespace qualify them.
Add timestamping and local residual printing directly (computed in the convergence checks) instead of from loggers.
Add capabilities to solve with mixed precision. In particular, gather and scatter operations can additionally perform averaging, sum, difference operations before copying.
Additionally, some minor improvements are also made: