gpuopenanalytics / demo-docker

Demo notebooks inside a docker for end-to-end examples
113 stars 29 forks source link

GDFError: GDF_UNSUPPORTED_DTYPE with std() function #7

Closed michael-balint closed 7 years ago

michael-balint commented 7 years ago

When running notebooks/mapd_to_pygdf_to_h2oaiglm.ipynb, an error occurs during step 23...

for k in (num_cols - response_set):
    df[k] = df[k].fillna(df[k].mean())
    assert df[k].null_count == 0
    std = df[k].std()
    # drop near constant columns
    if not np.isfinite(std) or std < 1e-4:
        del df[k]
        print('drop near constant', k)
    else:
        df[k] = df[k].scale()

Error output:

---------------------------------------------------------------------------
GDFError                                  Traceback (most recent call last)
<ipython-input-26-43006e4ffe8b> in <module>()
      2     df[k] = df[k].fillna(df[k].mean())
      3     assert df[k].null_count == 0
----> 4     std = df[k].std()
      5     # drop near constant columns
      6     if not np.isfinite(std) or std < 1e-4:

/home/appuser/pygdf/pygdf/dataframe.py in std(self)
   1074         """Compute the standard deviation of the series
   1075         """
-> 1076         return np.sqrt(self.var())
   1077 
   1078     def var(self):

/home/appuser/pygdf/pygdf/dataframe.py in var(self)
   1079         """Compute the variance of the series
   1080         """
-> 1081         mu, var = self.mean_var()
   1082         return var
   1083 

/home/appuser/pygdf/pygdf/dataframe.py in mean_var(self)
   1085         """Compute mean and variance at the same time.
   1086         """
-> 1087         mu, var = self._impl.stats(self).mean_var()
   1088         return mu, var
   1089 

/home/appuser/pygdf/pygdf/numerical.py in mean_var(self)
    130         mu = self.mean()
    131         n = len(self._series)
--> 132         asum = _gdf.apply_reduce(libgdf.gdf_sum_squared_generic, self._series)
    133         var = asum / n - mu ** 2
    134         return mu, var

/home/appuser/pygdf/pygdf/_gdf.py in apply_reduce(fn, inp)
     82     out = cuda.device_array(outsz, dtype=inp.dtype)
     83     # call reduction
---> 84     fn(inp._cffi_view, unwrap_devary(out), outsz)
     85     # return 1st element
     86     return out[0]

/home/appuser/Miniconda3/envs/pycudf_notebook_py35/lib/python3.5/site-packages/libgdf_cffi/wrapper.py in wrap(*args)
     26                         raw = self._api.gdf_error_get_name(errcode)
     27                         errname = self._ffi.string(raw).decode('ascii')
---> 28                         raise GDFError(errcode, errname)
     29 
     30                 wrap.__name__ = fn.__name__

GDFError: GDF_UNSUPPORTED_DTYPE

df is a pygdf.dataframe.DataFrame df[k] is a pygdf.dataframe.Series df[k][0] is a numpy.int32

sklam commented 7 years ago

Some routines are converted from jit-compiled version into statically compiled version in libgdf. The error is raised when the operation doesn't support the dtype. There is a missing typecast or missing type-specialization.