Open paulsbrookes opened 6 months ago
Thanks for reporting @paulsbrookes!
The isinstance
checks against np.ndarray
failing is a significant, but known limitation currently. Ideally, more code would accept "array-like" objects rather than do hard instancechecks against np.ndarray
. This can be done by e.g., checking for the __array_interface__
attribute, or using try-except
with a call to np.asarray
. But, I know a lot of third-party code is not written this way. We hope to have a resolution for this soon.
The other isinstance
check against BaseOffset
seems a bug. I'll investigate and report back.
Thanks @shwina!
One workaround I've found for dealing with situations where third-party libraries do not know how to deal with a cuDF pandas DataFrame is to recreate the dataframe as follows:
import cudf.pandas
import numpy as np
cudf.pandas.install()
import pandas as pd
from cudf.pandas.module_accelerator import disable_module_accelerator
df = pd.DataFrame([0, 1, 2])
with disable_module_accelerator():
column_types = {col: dtype for col, dtype in df.dtypes.items()}
df = pd.DataFrame(df, columns=df.columns, index=df.index)
df = df.astype(column_types)
assert isinstance(df.to_numpy(), np.ndarray)
The code block above passes without an assertion error. If I understand correctly this creates a regular pandas dataframe from the cudf pandas dataframe which can then be used by third party libraries as normal.
@shwina do you know of any alternatives to this?
If I understand correctly this creates a regular pandas dataframe from the cudf pandas dataframe which can then be used by third party libraries as normal.
Correct - it also means that third-party libraries won't be able to leverage the GPU in any way.
@shwina do you know of any alternatives to this?
No - that's right. The disable_module_accelerator
is how you can temporarily make it so that pd
is in fact the "real" pandas
. We don't document this just quite yet, and in an ideal world you wouldn't have to use it at all, but it's fine as a workaround for now.
I opened https://github.com/rapidsai/cudf/pull/14678, which addresses the first issue you raised (instancechecks against pd.tseries.offsets.BaseOffset
).
Thanks @shwina!
Describe the bug Some type checks fail with cuDF pandas objects.
Steps/Code to reproduce bug The following examples fail with assertion errors:
Both of these examples pass if we remove the
cudf.pandas.install()
line.Expected behavior I expected the code blocks above to run so that I could use the accelerated version of pandas with zero code changes. The errors I'm facing make it difficult to work with cuDF pandas and other libraries (e.g. https://github.com/Nixtla/statsforecast).
Environment overview (please complete the following information)
Environment details
Click here to see environment details