modin-project / modin

Modin: Scale your Pandas workflows by changing a single line of code
http://modin.readthedocs.io
Apache License 2.0
9.74k stars 649 forks source link

Add GPU supports to MODIN #2538

Open xiwen1995 opened 3 years ago

xiwen1995 commented 3 years ago

My name is Xiwen Zhang. And I am a PhD student at Gatech CS, working with Prof. Alexey Tumanov. Over the last 4 months, our team has been working on adding GPU supports to MODIN. We have finished about 50 APIs. And our preliminary evaluation results are promising. We would like to merge our code to the MODIN project. Next, in this issue, I will briefly talk about the major design choices we have made to add GPU supports to MODIN, and our plan for code merging.

Design Choices:

Code Merging Plan (which files will be affected):

To make the job of code reviewing easier, we plan to merge incrementally. All the comments/feedback will be greatly appreciated. Also, feel free to ask me any question you may have.

anmyachev commented 3 years ago

Hello @xiwen1995!

First of all, I want to express my gratitude for the desire to combine your great work with MODIN.

When adding a new backend, your architectural decisions look exactly as we imagined. A quick note - adding a new backend should start by adding it under modin/experimental folder. So the paths will be as follows: modin/experimental/backends/cudf/ and modin/experimental/engines/ray/cudf_on_ray/.

Also of interest is the way of using a Ray actor per GPU and the results that you were able to obtain. Can you share some of them?

cc @devin-petersohn

devin-petersohn commented 3 years ago

Thanks @xiwen1995 and @anmyachev.

I don't think we need to add it to experimental. Like the Dask engine, we can just give a warning. The experimental flag is more designed around things that are still under heavy development and not usable for most people. I believe that the GPU support by @xiwen1995 and team is usable in its current state.

Thanks again @xiwen1995, @kvu35 and others! Let me know how I can help!

xiwen1995 commented 3 years ago

Thanks @devin-petersohn and @anmyachev. Yes, the GPU support is usable now, although only a subset of operators are covered.

mhoangvslev commented 3 years ago

Is there any instruction available to make use of this awesome feature?

On pure rapidsai environment, I do this in order to share tasks to CPU when the GPU is saturated

import cudf
cudf.set_allocator("managed")
devin-petersohn commented 3 years ago

@mhoangvslev We are working on the integration now! This functionality is for multi machine, multiple GPU support.

mhoangvslev commented 3 years ago

@mhoangvslev We are working on the integration now! This functionality is for multi machine, multiple GPU support.

Surely this would be helpful to use in hosted ML instances!

tianlinzx commented 2 years ago

Any updates on this issue ?

mvashishtha commented 2 years ago

@tianlinzx there is some experimental and partial cudf support in the Modin source, but I hear from @prutskov that it's mostly not working. Modin's CI doesn't test whether the cudf support is working. I don't have a cudf setup I can use to manually test either.

AFAIK no regular Modin contributors are trying to improve GPU support right now.

We'll leave this issue as the canonical one for GPU support.

jaysin60 commented 4 months ago

Hi - Anyone still working on this or implementation is dead for now?

YarShev commented 4 months ago

There is experimental support for Intel GPU through HDK - https://modin.readthedocs.io/en/stable/development/using_hdk.html#running-on-a-gpu, but it is not tested in CI.