mtazzari / galario

Gpu Accelerated Library for Analysing Radio Interferometer Observations
https://mtazzari.github.io/galario/
GNU Lesser General Public License v3.0
31 stars 15 forks source link

Use streams to avoid blocking GPU while copying #60

Closed mtazzari closed 7 years ago

mtazzari commented 7 years ago

An important point is raised in this blog post:

https://devblogs.nvidia.com/parallelforall/how-overlap-data-transfers-cuda-fortran/

If we use the default stream, we and nobody else can overlap copy and execution. As library authors, we should not use the default stream!

fredRos commented 7 years ago

Yes, we should. But it's not top priority as it doesn't change the speed of a single transform

mtazzari commented 7 years ago

As suggested by Richard: Try compiling with the --default-stream per-thread option! here docs: https://devblogs.nvidia.com/parallelforall/gpu-pro-tip-cuda-7-streams-simplify-concurrency/

fredRos commented 7 years ago

Ôh man, such an easy solution. I will create a PR for @mtazzari to test on his GPU. But we have to test that this also helps with multiple processes as that's the actual use case with emcee