modin-project / modin

Modin: Scale your Pandas workflows by changing a single line of code
http://modin.readthedocs.io
Apache License 2.0
9.81k stars 651 forks source link

Zero-copy ray.get for to_pandas #2815

Open vnlitvinov opened 3 years ago

vnlitvinov commented 3 years ago

Since Ray uses pickle protocol 5 it should be possible to deserialize partitions in-place when cooking the joined dataframe instead of copying them around.

vnlitvinov commented 3 years ago

cc @YarShev

YarShev commented 6 months ago

ray.get() allows to deserialize data with zero-copy for primitive data types (if an object supports pickle protocol 5). Then, we intentionally make a copy of data so the user can mutate it so I don't think we can do anything on this matter to speed up to_pandas. I think we can close the issue. @anmyachev, @dchigarev, thoughts?