trilinos / Trilinos

Primary repository for the Trilinos Project
https://trilinos.org/
Other
1.22k stars 569 forks source link

Tpetra: CrsMatrixMultiplyOp should not store a local matrix device? #13545

Open vasylivy opened 1 month ago

vasylivy commented 1 month ago

Hi,

CrsMatrixMultiplyOp should not store a local matrix device, just the rcp to the crs matrix and then in the apply calls should get the local matrix device. A consequence of storing the local matrix device is the wrapped dual view host / device use counts will be off for col idxes in non-uvm cuda builds for any user that has CrsMatrixMultiplyOp objs at the same scope as the corresponding local matrix host obj.

Thanks,

Yaro

lucbv commented 3 weeks ago

I think one issue is that the KokkosSparse::CrsMatrix requires a view and not a dual view like Tpetra stores. How would you suggest to alleviate that constraint?

vasylivy commented 3 weeks ago

I was thinking of something like the following in the apply(..), assuming this has no perf impact?

      auto localMultiply =
          local_matrix_op_t(std::make_shared<local_matrix_device_type>(
              matrix_->getLocalMatrixDevice()));

this way it would be fine to keep CrsMatrixMultiplyOp as a member and not worry about potential conflicts w/ the hidden dual views as their scope is now limited to the apply(...) call

Yaro