v-morello / riptide

Pulsar searching with the Fast Folding Algorithm (FFA)
https://riptide-ffa.readthedocs.io/
MIT License
30 stars 11 forks source link

GPU Acceleration 🤔 ? #2

Closed astrogewgaw closed 6 months ago

astrogewgaw commented 3 years ago
This is a **feature request** 😁 : Jayanta and I have been talking about deploying a real-time pulsar search pipeline at the GMRT. We already have GPU-based dedispersion codes that we can use. While we are going to test out a hybrid system first (that is, dedispersion on the GPU and FFA, using `riptide`, on the CPU), we were also thinking about porting `riptide`'s FFA kernels to the GPU. It should be possible to write highly parallelized kernels for the GPU and bind them to Python using `pybind11`, something that `riptide` is already doing for the CPU kernels. Is this something that's on `riptide`'s roadmap?
v-morello commented 3 years ago

It's one of these things that I imagine myself doing "when I get some free time" but that's unlikely to happen soon. It's harder and less useful than you might think though. With a handful of CPUs you can already cover the long period regime with good phase resolution; even if you could obtain large speedup, you'd mostly gain the ability to cover shorter periods, where the extra sensitivity over the FFT is not particularly dramatic. Also, searching shorter periods requires searching more tightly spaced DM trials, so there's a double whammy cost increase when searching short periods (I talk about this in the paper, the b^3 term where b is the number of phase bins). Another thing to consider is that the peak finding algorithm consumes a fair amount of time at the moment, so if you made the FFA + matched filtering say 10x faster, you'd have to make peak finding much faster as well for it to matter (and it needs to remain as good as it is now, otherwise the extra sensitivity of the FFA is compromised). Lastly, the processing model where I use non-integer downsampling to maintain a roughly constant phase resolution might not be the way to go on the GPU. So all in all, lots of things to be carefully thought about.

PS: If you're bold enough to try implementing this, please do it by small increments; i.e. start by implementing a working FFA transform kernel on the GPU, then a matched filtering kernel, then think about interfacing it with the existing codebase. No merge pull requests with 500+ lines changed in a single commit please :smile: If you could demonstrate something with solid potential, then I'd be happy to make deeper changes in the code to accommodate it.

astrogewgaw commented 3 years ago
@v-morello I will keep all this in mind. I am planning on making a separate repository for the GPU FFA kernels. If I end up making something that actually works and shows meaningful speed-ups over the CPU implementation, I will let you know. Then we can think of merging my repository into **riptide**. I will also let you know if we end up getting good results with a heterogeneous setup, that is, with dedispersion on the GPU and FFA on the CPU. **PS**: As for the **clfd** commits, that one's on me 😅, I ended up making a lot of changes. Most of the file removals there are temporary (as in I will re-add the files later on). The final implementation should be ready for your review soon. I will let you know as soon as I am done with it. Do review it as and when you can spare some time 😁 .
v-morello commented 6 months ago

I'm going to close this because let's face it, it's not happening, but I'm happy to be proven wrong in the future :smile: