rapidsai / cudf

cuDF - GPU DataFrame Library
https://docs.rapids.ai/api/cudf/stable/
Apache License 2.0
8.38k stars 894 forks source link

[BUG] Rolling window's apply function throws `TypingError` #6267

Open Salonijain27 opened 4 years ago

Salonijain27 commented 4 years ago

On running the apply function for rolling and trying to analyze array or any other variable type other than a single value I get the following error: TypingError: Failed in nopython mode pipeline (step: nopython frontend)

Code to reproduce the error:

import cudf
import numpy as np
import math
def groll_sort(x):
    t = x.median() #np.median(x.values)
    return t
df = cudf.DataFrame()
df['a'] = (0.25, 0.3, 0.5,1,3,1,-1,3,-2)
rolling = df.rolling(window=3).apply(groll_sort)
print(rolling)

Note: I also tried using t = np.median(x.values) in the function On running the above code i get the following error:

---------------------------------------------------------------------------
TypingError                               Traceback (most recent call last)
<ipython-input-6-99e3758f2d02> in <module>
      7 df = cudf.DataFrame()
      8 df['a'] = (0.25, 0.3, 0.5,1,3,1,-1,3,-2)
----> 9 rolling = df.rolling(window=3).apply(groll_sort)
     10 print(rolling)

~/miniconda3/envs/branch15/lib/python3.8/site-packages/cudf/core/window/rolling.py in apply(self, func, *args, **kwargs)
    276                 "Handling UDF with null values is not yet supported"
    277             )
--> 278         return self._apply_agg(func)
    279
    280     def _normalize(self):

~/miniconda3/envs/branch15/lib/python3.8/site-packages/cudf/core/window/rolling.py in _apply_agg(self, agg_name)
    236             return self._apply_agg_series(self.obj, agg_name)
    237         else:
--> 238             return self._apply_agg_dataframe(self.obj, agg_name)
    239
    240     def sum(self):

~/miniconda3/envs/branch15/lib/python3.8/site-packages/cudf/core/window/rolling.py in _apply_agg_dataframe(self, df, agg_name)
    225         result_df = cudf.DataFrame({})
    226         for i, col_name in enumerate(df.columns):
--> 227             result_col = self._apply_agg_series(df[col_name], agg_name)
    228             result_df.insert(i, col_name, result_col)
    229         result_df.index = df.index

~/miniconda3/envs/branch15/lib/python3.8/site-packages/cudf/core/window/rolling.py in _apply_agg_series(self, sr, agg_name)
    201     def _apply_agg_series(self, sr, agg_name):
    202         if isinstance(self.window, int):
--> 203             result_col = libcudf.rolling.rolling(
    204                 sr._column,
    205                 None,

cudf/_lib/rolling.pyx in cudf._lib.rolling.rolling()

cudf/_lib/aggregation.pyx in cudf._lib.aggregation.make_aggregation()

cudf/_lib/aggregation.pyx in cudf._lib.aggregation._AggregationFactory.from_udf()

~/miniconda3/envs/branch15/lib/python3.8/site-packages/cudf/utils/cudautils.py in compile_udf(udf, type_signature)
    287     """
    288     decorated_udf = cuda.jit(udf, device=True)
--> 289     compiled = decorated_udf.compile(type_signature)
    290     ptx_code = decorated_udf.inspect_ptx(type_signature).decode("utf-8")
    291     output_type = numpy_support.as_dtype(compiled.signature.return_type)

~/miniconda3/envs/branch15/lib/python3.8/site-packages/numba/cuda/compiler.py in compile(self, args)
    162         """
    163         if args not in self._compileinfos:
--> 164             cres = compile_cuda(self.py_func, None, args, debug=self.debug,
    165                                 inline=self.inline)
    166             first_definition = not self._compileinfos

~/miniconda3/envs/branch15/lib/python3.8/site-packages/numba/core/compiler_lock.py in _acquire_compile_lock(*args, **kwargs)
     30         def _acquire_compile_lock(*args, **kwargs):
     31             with self:
---> 32                 return func(*args, **kwargs)
     33         return _acquire_compile_lock
     34

~/miniconda3/envs/branch15/lib/python3.8/site-packages/numba/cuda/compiler.py in compile_cuda(pyfunc, return_type, args, debug, inline)
     36         flags.set('forceinline')
     37     # Run compilation pipeline
---> 38     cres = compiler.compile_extra(typingctx=typingctx,
     39                                   targetctx=targetctx,
     40                                   func=pyfunc,

~/miniconda3/envs/branch15/lib/python3.8/site-packages/numba/core/compiler.py in compile_extra(typingctx, targetctx, func, args, return_type, flags, locals, library, pipeline_class)
    601     pipeline = pipeline_class(typingctx, targetctx, library,
    602                               args, return_type, flags, locals)
--> 603     return pipeline.compile_extra(func)
    604
    605

~/miniconda3/envs/branch15/lib/python3.8/site-packages/numba/core/compiler.py in compile_extra(self, func)
    337         self.state.lifted = ()
    338         self.state.lifted_from = None
--> 339         return self._compile_bytecode()
    340
    341     def compile_ir(self, func_ir, lifted=(), lifted_from=None):

~/miniconda3/envs/branch15/lib/python3.8/site-packages/numba/core/compiler.py in _compile_bytecode(self)
    399         """
    400         assert self.state.func_ir is None
--> 401         return self._compile_core()
    402
    403     def _compile_ir(self):

~/miniconda3/envs/branch15/lib/python3.8/site-packages/numba/core/compiler.py in _compile_core(self)
    379                 self.state.status.fail_reason = e
    380                 if is_final_pipeline:
--> 381                     raise e
    382         else:
    383             raise CompilerError("All available pipelines exhausted")

~/miniconda3/envs/branch15/lib/python3.8/site-packages/numba/core/compiler.py in _compile_core(self)
    370             res = None
    371             try:
--> 372                 pm.run(self.state)
    373                 if self.state.cr is not None:
    374                     break

~/miniconda3/envs/branch15/lib/python3.8/site-packages/numba/core/compiler_machinery.py in run(self, state)
    339                     (self.pipeline_name, pass_desc)
    340                 patched_exception = self._patch_error(msg, e)
--> 341                 raise patched_exception
    342
    343     def dependency_analysis(self):

~/miniconda3/envs/branch15/lib/python3.8/site-packages/numba/core/compiler_machinery.py in run(self, state)
    330                 pass_inst = _pass_registry.get(pss).pass_inst
    331                 if isinstance(pass_inst, CompilerPass):
--> 332                     self._runPass(idx, pass_inst, state)
    333                 else:
    334                     raise BaseException("Legacy pass in use")

~/miniconda3/envs/branch15/lib/python3.8/site-packages/numba/core/compiler_lock.py in _acquire_compile_lock(*args, **kwargs)
     30         def _acquire_compile_lock(*args, **kwargs):
     31             with self:
---> 32                 return func(*args, **kwargs)
     33         return _acquire_compile_lock
     34

~/miniconda3/envs/branch15/lib/python3.8/site-packages/numba/core/compiler_machinery.py in _runPass(self, index, pss, internal_state)
    289             mutated |= check(pss.run_initialization, internal_state)
    290         with SimpleTimer() as pass_time:
--> 291             mutated |= check(pss.run_pass, internal_state)
    292         with SimpleTimer() as finalize_time:
    293             mutated |= check(pss.run_finalizer, internal_state)

~/miniconda3/envs/branch15/lib/python3.8/site-packages/numba/core/compiler_machinery.py in check(func, compiler_state)
    262
    263         def check(func, compiler_state):
--> 264             mangled = func(compiler_state)
    265             if mangled not in (True, False):
    266                 msg = ("CompilerPass implementations should return True/False. "

~/miniconda3/envs/branch15/lib/python3.8/site-packages/numba/core/typed_passes.py in run_pass(self, state)
     90                               % (state.func_id.func_name,)):
     91             # Type inference
---> 92             typemap, return_type, calltypes = type_inference_stage(
     93                 state.typingctx,
     94                 state.func_ir,

~/miniconda3/envs/branch15/lib/python3.8/site-packages/numba/core/typed_passes.py in type_inference_stage(typingctx, interp, args, return_type, locals, raise_errors)
     68
     69         infer.build_constraint()
---> 70         infer.propagate(raise_errors=raise_errors)
     71         typemap, restype, calltypes = infer.unify(raise_errors=raise_errors)
     72

~/miniconda3/envs/branch15/lib/python3.8/site-packages/numba/core/typeinfer.py in propagate(self, raise_errors)
    992                                   if isinstance(e, ForceLiteralArg)]
    993                 if not force_lit_args:
--> 994                     raise errors[0]
    995                 else:
    996                     raise reduce(operator.or_, force_lit_args)

TypingError: Failed in nopython mode pipeline (step: nopython frontend)
Unknown attribute 'median' of type array(float64, 1d, A)

File "<ipython-input-6-99e3758f2d02>", line 5:
def groll_sort(x):
    t = x.median() #np.median(x.values)
    ^

During: typing of get attribute at <ipython-input-6-99e3758f2d02> (5)

File "<ipython-input-6-99e3758f2d02>", line 5:
def groll_sort(x):
    t = x.median() #np.median(x.values)

The same code runs on pandas and gives the following output: I/P:

import pandas
import numpy as np
def groll_sort(x):
    t = x.median()
    return t
df = pandas.DataFrame()
df['a'] = (0.25, 0.3, 0.5,1,3,1,-1,3,-2)
rolling = df.rolling(window=3).apply(groll_sort)
print(rolling)

O/P:

     a
0  NaN
1  NaN
2  0.3
3  0.5
4  1.0
5  1.0
6  1.0
7  1.0
8 -1.0
kkraus14 commented 4 years ago

The issue is we try to JIT compile the UDF via Numba and the UDF provided can't be properly lowered by Numba into a GPU kernel. It can't be lowered because the function being passed isn't a kernel, but is instead an array / column level function of median.

We could make the typical Pandas way of apply work via iteration, but it would be painfully slow in the case of a large number of windows of small sizes.

github-actions[bot] commented 3 years ago

This issue has been marked rotten due to no recent activity in the past 90d. Please close this issue if no further response or action is needed. Otherwise, please respond with a comment indicating any updates or changes to the original issue and/or confirm this issue still needs to be addressed.