Open vasylivy opened 1 month ago
I think one issue is that the KokkosSparse::CrsMatrix requires a view and not a dual view like Tpetra stores. How would you suggest to alleviate that constraint?
I was thinking of something like the following in the apply(..), assuming this has no perf impact?
auto localMultiply =
local_matrix_op_t(std::make_shared<local_matrix_device_type>(
matrix_->getLocalMatrixDevice()));
this way it would be fine to keep CrsMatrixMultiplyOp as a member and not worry about potential conflicts w/ the hidden dual views as their scope is now limited to the apply(...) call
Yaro
Hi,
CrsMatrixMultiplyOp should not store a local matrix device, just the rcp to the crs matrix and then in the apply calls should get the local matrix device. A consequence of storing the local matrix device is the wrapped dual view host / device use counts will be off for col idxes in non-uvm cuda builds for any user that has CrsMatrixMultiplyOp objs at the same scope as the corresponding local matrix host obj.
Thanks,
Yaro