For GPU version , use cudf or dask-cudf for the input dataframe instead of pandas

webcoderz commented 1 year ago

Problem To enable full gpu data science it would be really useful to consider using dask-cudf for data frames larger than single gpu memory or cudf for data frames that can fit entirely on a single gpu, instead of having to pass the data to the cpu for input into the model training.

Solution Allow a dask dataframe or a cudf dataframe (or both) as input into the model

leoniewgnr commented 1 year ago

Hi @webcoderz, thanks for bringing this up! How would this work then? Would the dataframe always stay a dask or cudf dataframe throughout the whole model? And what are the benefits? Would this be faster then?

webcoderz commented 1 year ago

Basically it would enable full gpu pipeline and would give a speedup and enable a larger dataframe to pass through , a dask dataframe can hold a very large dataframe, so this would potentially reduce bottlenecks in terms of data size as when reading in GBs of data into a dask dataframe it splits it into x number of partitions(pd.Dataframes) and would also reduce delays when offramping from gpu to cpu. the cool thing about dask is it runs in parallel and can do lazy loading so you can do stuff like delaying compute until the end of your function chain or wherever is more efficient and then execute a function chain in parallel. The original prophet was GIL locked so this wasn't possible, but it should be here. A lot of considerations have to be made when dealing with large data , like one of the df_utils you all do is making a copy of the dataframe which would potentially blow out memory on a really large dataset, my guess would be to start here https://ml.dask.org/pytorch.html and see if dask-ml supports some of the things you're doing here out of the box before proceeding down the path, I can try to help a bit I have to read more of the code to understand better, but typically it's completely possible to use torch and dask together

ourownstory commented 1 year ago

@webcoderz Thank you for your excellent suggestion, and highlighting the other device transfer issue. @leoniewgnr is currently parallelizing some of our older code parts to help spead up data processing (She is mostly done). What you are suggesting sounds like the appropriate next step, as I think that you are right about most of our compute being data-pipeline and device-transfer bottlenecked. If you would be down for the challenge, we would love to have a chat with you and discuss how to proceed. BTW, Our dev core team (all of us are open source volunteers) is open for new members. :)

webcoderz commented 1 year ago

Hi @ourownstory yea I'd be down! I have a lot on my plate currently but would be happy to help! I think it would be super cool to run this on hundreds of millions of rows of data 😀

beckernick commented 1 year ago

Hi! I came across this issue due to the cuDF / Dask mentions (I'm part of the RAPIDS team at NVIDIA that develops cuDF, Dask-CUDA, and a variety of other projects for accelerated computing).

We'd love to see the NeuralProphet community contribute this functionality to the package! I'd be happy to join any discussions on this topic and help try to answer any questions that may come up.

webcoderz commented 1 year ago

Perfect! this is excellent

ourownstory commented 1 year ago

@webcoderz Please let me know if you may still be game for this challenge. I'd be happy to hop on a call to discuss it.

webcoderz commented 1 year ago

I am all here for it! I gave @leoniewgnr my email if you want to reach out to schedule something!

ourownstory / neural_prophet

For GPU version , use cudf or dask-cudf for the input dataframe instead of pandas #1362