rapidsai / cudf

cuDF - GPU DataFrame Library
https://docs.rapids.ai/api/cudf/stable/
Apache License 2.0
8.42k stars 901 forks source link

[QST] How to debug a user defined function ? Break points just won't stop. #16096

Closed langslike closed 2 months ago

langslike commented 4 months ago

I want to try debug and run into the function "count_if_gt_3", and I set break points in the function, however the program wont stop and finish running, how to debug it? I know how to run file in debug mode, but with cudf, it just wont work. I will be glad if anyone can show me how.

import cudf
def count_if_gt_3(window):
    count = 0
    for i in window:
            if i > 3:
                    count += 1
    return count

s = cudf.Series([0, 1.1, 5.8, 3.1, 6.2, 2.0, 1.5])
s.rolling(3, min_periods=1).apply(count_if_gt_3)

rolling.apply

brandon-b-miller commented 4 months ago

Hi @langslike , rolling.apply doesn't actually run the real python function, it uses numba cuda to analyze the function you pass and create a GPU equivalent that works on your data instead. However, the GPU version should match the python logic exactly, so if you want to debug, I'd recommend converting the series to pandas and passing the same function:

s.to_pandas().rolling(3, min_periods=1).apply(count_if_gt_3))

After you have confirmed the function does what you want, you can remove the breakpoints and pass it to the cudf series again, I'd expect it to produce the same result.

langslike commented 4 months ago

@brandon-b-miller Thanks for your timely reply, I have some knowledge on pandas and I know what pandas will do with the "count_if_gt_3" function, I just wonder if cudf will behave like pandas. Do you mean that I actually cannot probe in what's going on under the hood with cudf? Should I just compare the cudf version code's output and pandas version code's output to confirm that I write the correct UDF?

brandon-b-miller commented 4 months ago

Yes, the intention is that cudf will produce the same result as pandas for any function that we accept through the API. If a function was found that broke this rule somehow, it would be considered a bug in cuDF. For simple functions, this should allow debugging using pandas with the expectation things will work on the GPU as well.

langslike commented 4 months ago

I tried to write a user defined function(UDF), but I got "Unknown attribute 'mean' of type array(float64, 1d, A)", how can I deal with it? Error code:

Traceback (most recent call last):

File "/home/xxxx/miniconda3/envs/rapids/lib/python3.9/runpy.py", line 197, in _run_module_as_main return _run_code(code, main_globals, None, File "/home/xxxx/miniconda3/envs/rapids/lib/python3.9/runpy.py", line 87, in _run_code exec(code, run_globals) File "/home/xxxx/.vscode-server/extensions/ms-python.debugpy-2024.6.0-linux-x64/bundled/libs/debugpy/adapter/../../debugpy/launcher/../../debugpy/main.py", line 39, in cli.main() File "/home/xxxx/.vscode-server/extensions/ms-python.debugpy-2024.6.0-linux-x64/bundled/libs/debugpy/adapter/../../debugpy/launcher/../../debugpy/../debugpy/server/cli.py", line 430, in main run() File "/home/xxxx/.vscode-server/extensions/ms-python.debugpy-2024.6.0-linux-x64/bundled/libs/debugpy/adapter/../../debugpy/launcher/../../debugpy/../debugpy/server/cli.py", line 284, in run_file runpy.run_path(target, run_name="main") File "/home/xxxx/.vscode-server/extensions/ms-python.debugpy-2024.6.0-linux-x64/bundled/libs/debugpy/_vendored/pydevd/_pydevd_bundle/pydevd_runpy.py", line 321, in run_path return _run_module_code(code, init_globals, run_name, File "/home/xxxx/.vscode-server/extensions/ms-python.debugpy-2024.6.0-linux-x64/bundled/libs/debugpy/_vendored/pydevd/_pydevd_bundle/pydevd_runpy.py", line 135, in _run_module_code _run_code(code, mod_globals, init_globals, File "/home/xxxx/.vscode-server/extensions/ms-python.debugpy-2024.6.0-linux-x64/bundled/libs/debugpy/_vendored/pydevd/_pydevd_bundle/pydevd_runpy.py", line 124, in _run_code exec(code, run_globals) File "/home/xxxx/quant/test_cudf.py", line 16, in print(df_g['close'].rolling(5).apply(mean)) File "/home/xxxx/miniconda3/envs/rapids/lib/python3.9/site-packages/cudf/core/window/rolling.py", line 415, in apply return self._apply_agg(func) File "/home/xxxx/miniconda3/envs/rapids/lib/python3.9/site-packages/cudf/core/window/rolling.py", line 267, in _apply_agg self.obj.name: self._apply_agg_column( File "/home/xxxx/miniconda3/envs/rapids/lib/python3.9/site-packages/cudf/core/window/rolling.py", line 243, in _apply_agg_column return libcudf.rolling.rolling( File "/home/xxxx/miniconda3/envs/rapids/lib/python3.9/contextlib.py", line 79, in inner return func(*args, kwds) File "rolling.pyx", line 62, in cudf._lib.rolling.rolling File "aggregation.pyx", line 230, in cudf._lib.aggregation.make_aggregation File "aggregation.pyx", line 187, in cudf._lib.aggregation.Aggregation.from_udf File "/home/xxxx/miniconda3/envs/rapids/lib/python3.9/site-packages/cudf/utils/cudautils.py", line 126, in compile_udf ptx_code, return_type = cuda.compile_ptx_for_current_device( File "/home/xxxx/miniconda3/envs/rapids/lib/python3.9/site-packages/numba/cuda/compiler.py", line 391, in compile_ptx_for_current_device return compile_ptx(pyfunc, sig, debug=debug, lineinfo=lineinfo, File "/home/xxxx/miniconda3/envs/rapids/lib/python3.9/site-packages/numba/cuda/compiler.py", line 380, in compile_ptx return compile(pyfunc, sig, debug=debug, lineinfo=lineinfo, device=device, File "/home/xxxx/miniconda3/envs/rapids/lib/python3.9/site-packages/numba/core/compiler_lock.py", line 35, in _acquire_compile_lock return func(*args, *kwargs) File "/home/xxxx/miniconda3/envs/rapids/lib/python3.9/site-packages/numba/cuda/compiler.py", line 330, in compile cres = compile_cuda(pyfunc, return_type, args, debug=debug, File "/home/xxxx/miniconda3/envs/rapids/lib/python3.9/site-packages/numba/core/compiler_lock.py", line 35, in _acquire_compile_lock return func(args, kwargs) File "/home/xxxx/miniconda3/envs/rapids/lib/python3.9/site-packages/numba/cuda/compiler.py", line 196, in compile_cuda cres = compiler.compile_extra(typingctx=typingctx, File "/home/xxxx/miniconda3/envs/rapids/lib/python3.9/site-packages/numba/core/compiler.py", line 744, in compile_extra return pipeline.compile_extra(func) File "/home/xxxx/miniconda3/envs/rapids/lib/python3.9/site-packages/numba/core/compiler.py", line 438, in compile_extra return self._compile_bytecode() File "/home/xxxx/miniconda3/envs/rapids/lib/python3.9/site-packages/numba/core/compiler.py", line 506, in _compile_bytecode return self._compile_core() File "/home/xxxx/miniconda3/envs/rapids/lib/python3.9/site-packages/numba/core/compiler.py", line 485, in _compile_core raise e File "/home/xxxx/miniconda3/envs/rapids/lib/python3.9/site-packages/numba/core/compiler.py", line 472, in _compile_core pm.run(self.state) File "/home/xxxx/miniconda3/envs/rapids/lib/python3.9/site-packages/numba/core/compiler_machinery.py", line 368, in run raise patched_exception File "/home/xxxx/miniconda3/envs/rapids/lib/python3.9/site-packages/numba/core/compiler_machinery.py", line 356, in run self._runPass(idx, pass_inst, state) File "/home/xxxx/miniconda3/envs/rapids/lib/python3.9/site-packages/numba/core/compiler_lock.py", line 35, in _acquire_compile_lock return func(*args, kwargs) File "/home/xxxx/miniconda3/envs/rapids/lib/python3.9/site-packages/numba/core/compiler_machinery.py", line 311, in _runPass mutated |= check(pss.run_pass, internal_state) File "/home/xxxx/miniconda3/envs/rapids/lib/python3.9/site-packages/numba/core/compiler_machinery.py", line 273, in check mangled = func(compiler_state) File "/home/xxxx/miniconda3/envs/rapids/lib/python3.9/site-packages/numba/core/typed_passes.py", line 112, in run_pass typemap, return_type, calltypes, errs = type_inference_stage( File "/home/xxxx/miniconda3/envs/rapids/lib/python3.9/site-packages/numba/core/typed_passes.py", line 93, in type_inference_stage errs = infer.propagate(raise_errors=raise_errors) File "/home/xxxx/miniconda3/envs/rapids/lib/python3.9/site-packages/numba/core/typeinfer.py", line 1091, in propagate raise errors[0] numba.core.errors.TypingError: Failed in cuda mode pipeline (step: nopython frontend) Unknown attribute 'mean' of type array(float64, 1d, A)**

File "test_cudf.py", line 10: def mean(s): return s.mean() ^

During: typing of get attribute at /home/xxxx/quant/test_cudf.py (10)

File "test_cudf.py", line 10: def mean(s): return s.mean()

below is my code:

import pandas as pd
import cudf

def mean(s):
    return s.mean()    # I know cudf has its rolling.mean, this is just a demo

stock = pd.read_parquet('stock.parquet') 
df_gpu = cudf.from_pandas(stock)
df_gpu['close'].rolling(5).apply(mean))  

the stock data is a multi-level index dataframe: image

brandon-b-miller commented 4 months ago

Unfortunately, user defined functions (for rolling) support a limited subset of python. Generally, these are the supported features, which as of today do not include array methods/numpy array APIs within UDFs. This means to compute the mean, you would have to do it using a scalar implementation that sums up the array elements and than divides by the length.

vyasr commented 2 months ago

Going to close as answered, but please feel free to reopen as needed. The UX here will continue to improve over time, but there are some things (like breakpoints) that will not be possible with cudf for the foreseeable future.