Open galipremsagar opened 4 months ago
We could hide this in a new mode, option, or env var ("synchronized-memory-mode" say) for the user. There would be new methods for the proxy object:
def _sync_gpu(self) # in-place write op on cpu
def _sync_cpu(self) # in-place write op on gpu
And a job queue.
job_queue = [op1, op2, ...]
Call _sync_cpu
in a separate process. Do operations on the gpu and insert in job_queue
. When there's fallback, wait until _sync_cpu
finishes(ie. until the job_queue
is empty), and then do the operation on the cpu. Now call _sync_gpu
to do cpu-->gpu transfer.
cc. @galipremsagar
The job queue is filled with in-place write operations. The steps I described before would be for in-place operations. For non-in-place operations, there are no cpu<-->gpu memory transfers. And the operation is tried on the cpu if there's fallback.
Is your feature request related to a problem? Please describe. In
cudf.pandas
we currently move dataframes from CPU to GPU or vice-versa for every step entirely. We can avoid performing transfers all the time by storing the dataframe in both memories and spending time in CPU<->GPU transfers if there are no in-place operations on the frames.Notice the
df.count(axis=0)
in cell6
taking quite a bit of time to move from CPU to GPU, we can avoid this.Describe the solution you'd like Maintain two identical copies of dataframe - one in GPU, another in CPU.