Open bd4 opened 3 years ago
I guess I'm not necessarily a great fan of using transform
as the API for this. It's usually based on iterators, a bit awkward to use, and it's not lazy. My suggestion would be to implement gt::vectorize
(see numpy/xtensor) instead.
This can be implemented via thrust::transform for CUDA/HIP and via std::transform for Intel SYCL