pymc-labs / pymc-marketing

Bayesian marketing toolbox in PyMC. Media Mix (MMM), customer lifetime value (CLV), buy-till-you-die (BTYD) models and more.
https://www.pymc-marketing.io/
Apache License 2.0
663 stars 183 forks source link

sample_posterior_predictive TypeError: ufunc 'isnan' not supported for the input types #1065

Closed mike-duran-mitchell closed 4 days ago

mike-duran-mitchell commented 4 days ago

Using pymc-marketing v0.9, but also had this happen in 0.8 before updating it to try to get past this error. All values in my df are floats and ints except for date, which is an appropriately formatted datetime column. None of the values are NA when I check isna in pandas, and I used df.fillna(0) to make sure. I have 0 divergences in my mmm and I am able to plot the model trace with no issue and do other steps in model inspection. I've tried it with models that have custom priors and models that do not, but I still can't seem to resolve it.

When I run mmm.sample_posterior_predictive(X, extend_idata=True, combined=True) this error pops out:

---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
Cell In[20], line 1
----> 1 mmm.sample_posterior_predictive(X, extend_idata=True, combined=True)

File /opt/conda/lib/python3.10/site-packages/pymc_marketing/mmm/mmm.py:1874, in MMM.sample_posterior_predictive(self, X_pred, extend_idata, combined, include_last_observations, original_scale, **sample_posterior_predictive_kwargs)
   1869 if include_last_observations:
   1870     X_pred = pd.concat(
   1871         [self.X.iloc[-self.adstock.l_max :, :], X_pred], axis=0
   1872     ).sort_values(by=self.date_column)
-> 1874 self._data_setter(X_pred)
   1876 with self.model:  # sample with new input data
   1877     post_pred = pm.sample_posterior_predictive(
   1878         self.idata, **sample_posterior_predictive_kwargs
   1879     )

File /opt/conda/lib/python3.10/site-packages/pymc_marketing/mmm/mmm.py:760, in BaseMMM._data_setter(self, X, y)
    757     data["target"] = np.zeros(X.shape[0], dtype=dtype)  # type: ignore
    759 with self.model:
--> 760     pm.set_data(data, coords=coords)

File /opt/conda/lib/python3.10/site-packages/pymc/model/core.py:2126, in set_data(new_data, model, coords)
   2123 model = modelcontext(model)
   2125 for variable_name, new_value in new_data.items():
-> 2126     model.set_data(variable_name, new_value, coords=coords)

File /opt/conda/lib/python3.10/site-packages/pymc/model/core.py:1126, in Model.set_data(self, name, values, coords)
   1124 if isinstance(values, list):
   1125     values = np.array(values)
-> 1126 values = convert_observed_data(values)
   1127 dims = self.named_vars_to_dims.get(name, None) or ()
   1128 coords = coords or {}

File /opt/conda/lib/python3.10/site-packages/pymc/pytensorf.py:86, in convert_observed_data(data)
     84 if isgenerator(data):
     85     return convert_generator_data(data)
---> 86 return convert_data(data)

File /opt/conda/lib/python3.10/site-packages/pymc/pytensorf.py:126, in convert_data(data)
    123         ret = data
    124 else:
    125     # already a ndarray, but not masked
--> 126     mask = np.isnan(data)
    127     if np.any(mask):
    128         ret = np.ma.MaskedArray(data, mask)

TypeError: ufunc 'isnan' not supported for the input types, and the inputs could not be safely coerced to any supported types according to the casting rule ''safe''

Any ideas on that?

mike-duran-mitchell commented 4 days ago

Actually resolved this myself using this function to update everything to the dtypes that work in numpy.

def convert_dtypes(x):
    if pd.api.types.is_integer_dtype(x):
        return x.astype('Int64')
    elif pd.api.types.is_float_dtype(x):
        return x.astype('Float64')
    return x

# Apply the function to the DataFrame
X = X.applymap(convert_dtypes)