robinhood / faust

Python Stream Processing
Other
6.73k stars 535 forks source link

GPU Support in Faust #635

Open datametrician opened 4 years ago

datametrician commented 4 years ago

This is more of a feature request. RAPIDS, https://rapids.ai/, provides GPU accelerate libraries that follow PyData API We have a stream library, custreamz (https://medium.com/rapids-ai/gpu-accelerated-stream-processing-with-rapids-f2b725696a61) which does something similar to Faust. Given Flink recently added GPU support, https://flink.apache.org/news/2020/08/06/external-resource.html, I was wondering if Faust would be willing to do the same so people could use GPU for stream processing.

For high-throughput problems, GPUs have proven to be cheaper in production, and open the doors for better deep learning and ML inferencing on python streams.

benhowes commented 4 years ago

Q: What's to stop you using https://rapids.ai/ with faust at the moment?

datametrician commented 4 years ago

I couldn't find any docs on how Faust supports GPUs. Is it GPU aware?

benhowes commented 4 years ago

Not directly, but I'm suggesting that you can just use a library in your agents to offload processing as required - the magic of faust is that it's more just a python library than a "way-of-life" framework such as spark/flink etc, so it's relatively easy to combine with other libraries.

datametrician commented 4 years ago

Cool. Dask is very similar. The only reason why integration is sometimes necessary is that with RAPIDS.ai, it's not about offloading, it's about making GPUs the primary form of computing. That said, we'll kick the tires and see what's possible.

jdye64 commented 4 years ago

I took a deep dive into what would be needed to practically make this happen and wanted to share my findings here. First a few things that are worth mentioning about dataframes for GPUs.

1) Not event based like Faust, dataframe (batch) focused instead of event = row focused, dataframes could be considered an "event" to fit the Faust paradigm 2) Getting data to the GPU and keeping it there without lots of movement is important to get the big speedups, gpu pointers instead of copying data around.

With that in mind here are the steps I would consider needed to make this happen.

1) Create a GPU Faust driver to ingest Kafka messages directly to GPU. We have one of these already in RAPIDs so this portion would effectively be a Driver that serves as a manager for cudf_kafka as we call it. 2) Faust Transport. Same as above just some small changes to enable using cudf_kafka 3) Support for a Faust 'Table Iterator' that instead iterators a CUDF dataframe residing on the GPU

I believe with those 3 steps Faust could be expanded to accept GPU memory pointers as events and then those pointers passed to the user defined @app.agent where the dataframes could be accessed and used as desired without the overhead of additional copies but at the same time gaining the huge benefits offered by the Faust framework.

Would be really interested to hear others thoughts on this.