kokkos / kokkos-core-wiki

1 stars 44 forks source link

Clarify parallel_scan behavior #513

Closed masterleinad closed 4 months ago

masterleinad commented 4 months ago

Related to https://github.com/kokkos/kokkos/issues/6897. This documents current behavior. The Serial backend calls the functor only once with final=true while the Cuda backend calls the functor once with final=false in its first kernel, once with final=false and once with final=true in its second kernel. This led to hard-to-understand problems when trying to cache results for final=false. The very least we can do is to document current restrictions better.