modin-project / modin

Modin: Scale your Pandas workflows by changing a single line of code
http://modin.readthedocs.io
Apache License 2.0
9.85k stars 651 forks source link

Adapt pandas Copy-on-Write functionality for modin #6254

Open anmyachev opened 1 year ago

anmyachev commented 1 year ago

https://pandas.pydata.org/docs/whatsnew/v2.0.0.html#copy-on-write-improvements

It should be interesting in terms of performance.

The main problem is to enable this option for all processes. At the moment, none of the options can be changed in Modin processes.

YarShev commented 1 year ago

Not sure we are able to support all this functionality because Modin treats partitions as immutable objects and we create a new partition in a remote function every time.

anmyachev commented 1 year ago

Not sure we are able to support all this functionality because Modin treats partitions as immutable objects and we create a new partition in a remote function every time.

For some Modin functions that use several pandas operations in a row in their implementation, this could be useful.

vnlitvinov commented 1 year ago

At the moment, none of the options can be changed in Modin processes.

There is a (somewhat peculiar) way, as I said in the pandas 2.0 PR: https://github.com/modin-project/modin/pull/5995#issuecomment-1576166040

anmyachev commented 1 year ago

At the moment, none of the options can be changed in Modin processes.

There is a (somewhat peculiar) way, as I said in the pandas 2.0 PR: #5995 (comment)

It seems to me that you are talking about a situation where we need to change the value of some option when starting Modin. What if the user needs to change the value after the start? (in the middle of the workload for example)