icl-utk-edu / slate

SLATE is a distributed, GPU-accelerated, dense linear algebra library targetting current and upcoming high-performance computing (HPC) systems. It is developed as part of the U.S. Department of Energy Exascale Computing Project (ECP).
https://icl.utk.edu/slate/
BSD 3-Clause "New" or "Revised" License
93 stars 21 forks source link

Expose Deep Copy #77

Open wavefunction91 opened 1 year ago

wavefunction91 commented 1 year ago

Describe the problem you are solving

From what I can tell from the documentation, all copy Matrix-ctors are shallow. The only way (that's obvious to me) to perform a deep copy is via the copy API

slate::Matrix<type> A( /* some parameters */ );
// Do something with A
auto A_copy = A.emptyLike();
A_copy.insertLocalTiles();
slate::copy( A, A_copy );

This is error prone, as it requires that A and A_copy have the same meta data to avoid redistribution, etc. (found a different function, updated the code example) Is there another way to perform a deep copy other than using the copy API?

Describe your proposed solution

If not, it's canonical in these situations to expose a clone API. Per the above:

class Matrix {
  Matrix clone(); // Deep copy of meta data and contents
};

slate::Matrix<type> A( /* some parameters */ );
// Do something with A
auto A_copy = A.clone();
mgates3 commented 1 year ago

So A.clone() would create a deep copy of A, basically doing emptyLike, insertLocalTiles, and copy? That seems reasonable.

Currently, copy is the right API, and yes it requires that A and A_copy have the same distribution. copy is a local copy, it doesn't redistribute. There's another function, redistribute, that redistributes.

wavefunction91 commented 1 year ago

So A.clone() would create a deep copy of A.

Yes, I suppose the following syntactic sugar would be

Matrix clone() { 
  auto cpy = this->emptyLike();
  cpy.insertLocalTiles();
  copy( *this, cpy );
  return cpy;
}

Thanks for pointing to redistribute, that will be useful in our application if the passed distribution doesn't align with algorithmic requirements (i.e. heev requiring GridOrder::Col for the time being). Is it possible that redistribute would do the same work as copy if no data redistribution is needed?

mgates3 commented 1 year ago

Yes, the intention is that redistribute would effectively just do copy if they have the same distribution. We haven't looked at the efficiency of these routines much, though.

Note that redistribute does not handle changing the block size. ScaLAPACK's redistribute function, P_GEMR2D, is more general in that it can change the block size.