Closed Rbiessy closed 6 years ago
GatherOp used to copy some large chunk of submatrices row by row using 2 chip operations making everything too slow. It is now using memcpy when possible.
This should be pushed to dev/eigen_mehdi or integration/eigen_mehdi as well.
GatherOp used to copy some large chunk of submatrices row by row using 2 chip operations making everything too slow. It is now using memcpy when possible.
This should be pushed to dev/eigen_mehdi or integration/eigen_mehdi as well.