rapidsai / cudf

cuDF - GPU DataFrame Library
https://docs.rapids.ai/api/cudf/stable/
Apache License 2.0
8.25k stars 883 forks source link

MemoryError: cudaErrorIllegalAddress an illegal memory access was encountered when transfering data from CPU to GPU on google colab #16238

Open MostafaBouzari opened 2 months ago

MostafaBouzari commented 2 months ago

I have tried to use my code, which works perfectly fine offline, on Google Colab. In an attempt to convert data from CPU to GPU for ML training using cuML i get an Error.

Here is the part of my code:

for _interval in list(range(0,26)):
    print(_interval)
    train,capm_1,capm_3,capm_12,to_be_predicted_df,evaluate,test=adjust_dates_by_month(dax_df,df,True,-1,2015,3,interval=_interval)
# df.drop(['PrefStock_w', 'PrefDiv_w'], axis=1, inplace=True)
    train['Date'] = train['Date'].astype('str')
    evaluate['Date'] = evaluate['Date'].astype('str')
    for col in train.select_dtypes(include=['Float64', 'Int64']).columns:
        train[col] = train[col].astype(float)
    for col in evaluate.select_dtypes(include=['Float64', 'Int64']).columns:
        evaluate[col] = evaluate[col].astype(float)
    evaluate=cpd.from_dataframe(evaluate,allow_copy=True)
    train=cpd.from_dataframe(train,allow_copy=True)

last two lines cause this error message:

MemoryError                               Traceback (most recent call last)
<ipython-input-22-f0b3ec832a19> in <cell line: 47>()
     55     for col in evaluate.select_dtypes(include=['Float64', 'Int64']).columns:
     56         evaluate[col] = evaluate[col].astype(float)
---> 57     evaluate=cpd.from_dataframe(evaluate,allow_copy=True)
     58     train=cpd.from_dataframe(train,allow_copy=True)
     59 

3 frames
/usr/local/lib/python3.10/dist-packages/cudf/core/dataframe.py in from_dataframe(df, allow_copy)
   7788 
   7789 def from_dataframe(df, allow_copy=False):
-> 7790     return df_protocol.from_dataframe(df, allow_copy=allow_copy)
   7791 
   7792 

/usr/local/lib/python3.10/dist-packages/cudf/core/df_protocol.py in from_dataframe(df, allow_copy)
    728 
    729         elif col.dtype[0] == _DtypeKind.STRING:
--> 730             columns[name], _buf = _protocol_to_cudf_column_string(
    731                 col, allow_copy
    732             )

/usr/local/lib/python3.10/dist-packages/cudf/core/df_protocol.py in _protocol_to_cudf_column_string(col, allow_copy)
    871     assert buffers["data"] is not None, "data buffer should never be None"
    872     data_buffer, data_dtype = buffers["data"]
--> 873     data_buffer = _ensure_gpu_buffer(data_buffer, data_dtype, allow_copy)
    874     encoded_string = build_column(
    875         data_buffer._buf,

/usr/local/lib/python3.10/dist-packages/cudf/core/df_protocol.py in _ensure_gpu_buffer(buf, data_type, allow_copy)
    774     if buf.__dlpack_device__()[0] != _Device.CUDA:
    775         if allow_copy:
--> 776             dbuf = rmm.DeviceBuffer(ptr=buf.ptr, size=buf.bufsize)
    777             return _CuDFBuffer(
    778                 as_buffer(dbuf, exposed=True),

device_buffer.pyx in rmm._lib.device_buffer.DeviceBuffer.__cinit__()

MemoryError: std::bad_alloc: CUDA error at: /__w/cudf/cudf/python/cudf/build/cp310-cp310-linux_x86_64/_deps/rmm-src/include/rmm/mr/device/cuda_memory_resource.hpp:60: cudaErrorIllegalAddress an illegal memory access was encountered
bdice commented 2 months ago

@MostafaBouzari Is it possible to provide a minimal reproducible code snippet, or a full notebook with data files? We'd be happy to investigate this, and a bit more information would help.

MostafaBouzari commented 2 months ago

@bdice i have attached 2 python files. First version of ML file works perfectly fine in Ubuntu Environment. The second Version works in Google Colab Environment.

you will notice that i had to in some cases convert my Dataframe to Pandas from Cuda, due to unavailability of some methods, which can be find in Pandas like pd.qcut and pd.Timestamp.

I managed to overcome the mentioned error:

    evaluate=cpd.DataFrame(evaluate)
    train=cpd.DataFrame(train)

I hope this helps with further investigation. Many Thanks Google Colab Bug.zip

MostafaBouzari commented 2 months ago

@bdice Unfortunately that wasn't the only error that i came across. If you upload the first version of my file on google Colab and try to run it you'll face multiple bugs and issues, answers (Work arounds) of which can be fined in the second version of my code.

Matt711 commented 2 months ago

Hey @MostafaBouzari, I wasn't able to reproduce with your first notebook. I get the following... Do you have an example notebook that reproduces the error?

---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
[<ipython-input-5-68ef0eb15efb>](https://localhost:8080/#) in <cell line: 47>()
     47 for _interval in list(range(0,26)):
     48     print(_interval)
---> 49     train,capm_1,capm_3,capm_12,to_be_predicted_df,evaluate,test=adjust_dates_by_month(dax_df,df,True,1,2015,3,interval=_interval)
     50 # df.drop(['PrefStock_w', 'PrefDiv_w'], axis=1, inplace=True)
     51     train['Date'] = train['Date'].astype('str')

4 frames
[/usr/local/lib/python3.10/dist-packages/pandas/core/reshape/merge.py](https://localhost:8080/#) in _maybe_coerce_merge_keys(self)
   1399                     inferred_right in string_types and inferred_left not in string_types
   1400                 ):
-> 1401                     raise ValueError(msg)
   1402 
   1403             # datetimelikes must match exactly

ValueError: You are trying to merge on int32 and object columns. If you wish to proceed you should use pd.concat
mroeschke commented 2 months ago

cpd.from_dataframe

I see from skimming your notebook that you're dealing with pandas DataFrames exclusively correct? If so we recommend you use from_pandas instead.

from_dataframe is better suited when you have a dataframe library that's not pandas but implements __dataframe__