SLATE is a distributed, GPU-accelerated, dense linear algebra library targetting current and upcoming high-performance computing (HPC) systems. It is developed as part of the U.S. Department of Energy Exascale Computing Project (ECP).
This PR removes tile life from the remaining routines. This doesn't do the big refactor to actually remove the tile life infrastructure, so that the review can focus on the logic for memory management.
A notable change is that I extended the releaseRemoteWorkspace functions to reduce multiple receive counts simultaneously, so I can get a little extra parallelism out of hbmm.
The cases I used in were routines where we used backward looking dependency (i.e., gemm[k-1]) in the compute routines. In those cases, using inout would put the release task on the critical path.
This PR removes tile life from the remaining routines. This doesn't do the big refactor to actually remove the tile life infrastructure, so that the review can focus on the logic for memory management.
A notable change is that I extended the
releaseRemoteWorkspace
functions to reduce multiple receive counts simultaneously, so I can get a little extra parallelism out ofhbmm
.