Closed shamanDevel closed 1 year ago
Hi, @shamanDevel Thank you for your question, and sorry for the late reply. This library doesn't have the functionality. Although I have tried to implement it, I found it difficult since it needs data transfer within the warp, where the shuffling pattern is unsuitable for the CUDA warp shuffle. Therefore, I use shared memory for this purpose in my application. Sorry, I can't help you. Thank you,
Ok, sounds fair. Let's hope that NVIDIA will provide this feature directly in their public API in the future.
Hi, I'm currently trying to cast/copy the accumulator fragment directly to an input fragment without going through shared memory. Can I use this library for that purpose? Is there a pre-defined functionality for that already? I can't find it in the docs or sources though... I tried working with
foreach
, but this gives only the mapping for a single fragment inside the lambda, not for two. I'm currently stuck with this problem, any help would be appreciated. Thanks!