modin-project / modin

Modin: Scale your Pandas workflows by changing a single line of code
http://modin.readthedocs.io
Apache License 2.0
9.87k stars 651 forks source link

FEAT: Conversion to/from dask dataframe #3301

Open Hoeze opened 3 years ago

Hoeze commented 3 years ago

Hi, would it be possible to have zero-copy conversion to/from Dask dataframes in modin?

This is useful in many cases when working directly with the Dask partitions.

Also, I'm using dask_on_ray and would be very happy if dask dataframe would directly work on Modin's partitions.

devin-petersohn commented 3 years ago

Hi @Hoeze, thanks for the question!

We have thought about a to/from Dask Dataframe before, what exactly are you trying to do with Modin/Dask? Is there something you need from Dask that Modin can't do (or vice-versa)?

It might be good to also add as a Dask issue, since we don't have control over whether dask.dataframe can operate on Modin partitions.

Hoeze commented 3 years ago

Hi @devin-petersohn!

We have thought about a to/from Dask Dataframe before, what exactly are you trying to do with Modin/Dask? Is there something you need from Dask that Modin can't do (or vice-versa)?

A main feature that I'd like to use are delayed objects and the lazy execution in Dask Dataframes. The explicit shuffle commands are a very nice feature of dask as well. Also, it's not possible to join a Dask dataframe with a modin dataframe as it is possible with Pandas dataframes.

In each case, it's annoying to 1) write the dataframe to disk and 2) open it again when I want to change the framework.