pandas-dev / pandas

Flexible and powerful data analysis / manipulation library for Python, providing labeled data structures similar to R data.frame objects, statistical functions, and much more
https://pandas.pydata.org
BSD 3-Clause "New" or "Revised" License
43.49k stars 17.88k forks source link

BUG: NotImplementedError: Cannot apply ufunc <ufunc 'hyp2f1'> to mixed DataFrame and Series inputs. #46138

Closed timmy-ops closed 2 years ago

timmy-ops commented 2 years ago

Pandas version checks

Reproducible Example

#imports
!pip install -U -q PyDrive
from pydrive.auth import GoogleAuth
from pydrive.drive import GoogleDrive
from google.colab import auth
from oauth2client.client import GoogleCredentials
import pandas as pd
import numpy as np
from datetime import datetime
!pip install lifetimes
from lifetimes import ParetoNBDFitter, GammaGammaFitter

#data
f_and_t = drive.CreateFile({'id': '1sXcv0SUUygyFvjVEtdV3kk8zyjp4GGQC'})
f_and_t.GetContentFile('f_and_t.csv')
f_and_t = pd.read_csv('f_and_t.csv')

reproducable example

time_days = 126
time_months = int(math.ceil(time_days / 30.0))   

#column-selection

summary = f_and_t[['customer_id', 'frequency_btyd', 'recency', 'T',
                 'monetary_btyd']]

summary.columns = ['customer_id', 'frequency', 'recency', 'T',
                     'monetary_value']
summary = summary.set_index('customer_id')

actual_df = f_and_t[['customer_id', 'frequency_btyd', 'monetary_dnn',
                     'target_monetary']]
actual_df.columns = ['customer_id', 'train_frequency', 'train_monetary',
                       'act_target_monetary']

#PARETO/NBD fitter
paretof = ParetoNBDFitter(penalizer_coef= 0.01)
paretof.fit(summary['frequency'], summary['recency'], summary['T'])

#Gamma Gamma Fitter

ggf = GammaGammaFitter(penalizer_coef=0)
ggf.fit(summary['frequency'], summary['monetary_value'])

#pareto predict

pareto_pred = paretof.predict(time_days,
                               summary['frequency'].values,
                                summary['recency'],
                                 summary['T'])

trans_pred = pareto_pred.fillna(0)

#gg predict

predicted_value = ggf.customer_lifetime_value(paretof,
                                                summary['frequency'],#.values,
                                                summary['recency'],
                                                summary['T'],
                                                summary['monetary_value'],
                                                time=time_months,
                                                discount_rate= 0.01)

### Issue Description

I was using the lifetimes library to calculate CLV for a list of customers. From one day to an other this issue appeared. I work on Google Colab with Pandas 1.3.5 (their current version). The error below appears for both functions: paretof.predict and ggf.customer_lifetime_value. For paretof.

I already found posts to this issue, from half a  year ago (https://stackoverflow.com/questions/69071130/lifetimes-library-issue-of-calculating-clv-when-using-function-customer-lifet).  The solution to use ".values" only worked for the paretof.predict function. At the ggf.customer_lifetime_value function I am stuck. 

NotImplementedError Traceback (most recent call last) in () 58 summary['monetary_value'], 59 time=time_months, ---> 60 discount_rate=discount_rate) 61 62

6 frames /usr/local/lib/python3.7/dist-packages/lifetimes/fitters/gamma_gamma_fitter.py in customer_lifetime_value(self, transaction_prediction_model, frequency, recency, T, monetary_value, time, discount_rate, freq) 294 295 return _customer_lifetime_value( --> 296 transaction_prediction_model, frequency, recency, T, adjusted_monetary_value, time, discount_rate, freq=freq 297 )

/usr/local/lib/python3.7/dist-packages/lifetimes/utils.py in _customer_lifetime_value(transaction_prediction_model, frequency, recency, T, monetary_value, time, discount_rate, freq) 496 # since the prediction of number of transactions is cumulative, we have to subtract off the previous periods 497 expected_number_of_transactions = transaction_prediction_model.predict( --> 498 i, frequency, recency, T 499 ) - transaction_prediction_model.predict(i - factor, frequency, recency, T) 500 # sum up the CLV estimates of all of the periods and apply discounted cash flow

/usr/local/lib/python3.7/dist-packages/lifetimes/fitters/pareto_nbd_fitter.py in conditional_expected_number_of_purchases_up_to_time(self, t, frequency, recency, T) 277 r, alpha, s, beta = params 278 --> 279 likelihood = self._conditional_log_likelihood(params, x, t_x, T) 280 first_term = ( 281 gammaln(r + x) - gammaln(r) + r log(alpha) + s log(beta) - (r + x) log(alpha + T) - s log(beta + T)

/usr/local/lib/python3.7/dist-packages/lifetimes/fitters/pareto_nbd_fitter.py in _conditional_log_likelihood(params, freq, rec, T) 212 213 A_1 = gammaln(r + x) - gammaln(r) + r log(alpha) + s log(beta) --> 214 log_A_0 = ParetoNBDFitter._log_A_0(params, x, rec, T) 215 216 A_2 = logaddexp(-(r + x) log(alpha + T) - s log(beta + T), log(s) + log_A_0 - log(r_s_x))

/usr/local/lib/python3.7/dist-packages/lifetimes/fitters/pareto_nbd_fitter.py in _log_A_0(params, freq, recency, age) 179 180 rsf = r + s + freq --> 181 p_1 = hyp2f1(rsf, t, rsf + 1.0, abs_alpha_beta / (max_of_alpha_beta + recency)) 182 q_1 = max_of_alpha_beta + recency 183 p_2 = hyp2f1(rsf, t, rsf + 1.0, abs_alpha_beta / (max_of_alpha_beta + age))

/usr/local/lib/python3.7/dist-packages/pandas/core/generic.py in __array_ufunc__(self, ufunc, method, *inputs, kwargs) 2030 self, ufunc: np.ufunc, method: str, *inputs: Any, *kwargs: Any 2031 ): -> 2032 return arraylike.array_ufunc(self, ufunc, method, inputs, kwargs) 2033 2034 # ideally we would define this to avoid the getattr checks, but

/usr/local/lib/python3.7/dist-packages/pandas/core/arraylike.py in array_ufunc(self, ufunc, method, *inputs, **kwargs) 292 raise NotImplementedError( 293 "Cannot apply ufunc {} to mixed DataFrame and Series " --> 294 "inputs.".format(ufunc) 295 ) 296 axes = self.axes

NotImplementedError: Cannot apply ufunc <ufunc 'hyp2f1'> to mixed DataFrame and Series inputs.



### Expected Behavior

Sometimes it works, but mostly it doesnt anymore. It should just no Error appear...

### Installed Versions

<details>

/usr/local/lib/python3.7/dist-packages/psycopg2/__init__.py:144: UserWarning:

The psycopg2 wheel package will be renamed from release 2.8; in order to keep installing from binary please use "pip install psycopg2-binary" instead. For details see: <http://initd.org/psycopg/docs/install.html#binary-install-from-pypi>.

INSTALLED VERSIONS
------------------
commit           : 66e3805b8cabe977f40c05259cc3fcf7ead5687d
python           : 3.7.12.final.0
python-bits      : 64
OS               : Linux
OS-release       : 5.4.144+
Version          : #1 SMP Tue Dec 7 09:58:10 PST 2021
machine          : x86_64
processor        : x86_64
byteorder        : little
LC_ALL           : None
LANG             : en_US.UTF-8
LOCALE           : en_US.UTF-8

pandas           : 1.3.5
numpy            : 1.21.5
pytz             : 2018.9
dateutil         : 2.8.2
pip              : 21.1.3
setuptools       : 57.4.0
Cython           : 0.29.28
pytest           : 3.6.4
hypothesis       : None
sphinx           : 1.8.6
blosc            : None
feather          : 0.4.1
xlsxwriter       : None
lxml.etree       : 4.2.6
html5lib         : 1.0.1
pymysql          : None
psycopg2         : 2.7.6.1 (dt dec pq3 ext lo64)
jinja2           : 2.11.3
IPython          : 5.5.0
pandas_datareader: 0.9.0
bs4              : 4.6.3
bottleneck       : 1.3.2
fsspec           : None
fastparquet      : None
gcsfs            : None
matplotlib       : 3.2.2
numexpr          : 2.8.1
odfpy            : None
openpyxl         : 3.0.9
pandas_gbq       : 0.13.3
pyarrow          : 6.0.1
pyxlsb           : None
s3fs             : None
scipy            : 1.4.1
sqlalchemy       : 1.4.31
tables           : 3.7.0
tabulate         : 0.8.9
xarray           : 0.18.2
xlrd             : 1.1.0
xlwt             : 1.3.0
numba            : 0.51.2

</details>
jreback commented 2 years ago

pls show a minimal copy pastable and reproducible example w/o any external dependencies

timmy-ops commented 2 years ago

pls show a minimal copy pastable and reproducible example w/o any external dependencies

Hi jreback,

Yes I am sorry and I tried to produce one, but the problem is the whole model cannot work without this bigger dataset.

mroeschke commented 2 years ago

It will be difficult to determine whether there is a true bug here without a more minimal example: https://matthewrocklin.com/blog/work/2018/02/28/minimal-bug-reports

ColtAllen commented 2 years ago

Hey @timmy-ops ,

This is not an issue with pandas, but rather the lifetimes library. Please repost this issue in the lifetimes repository.

The scipy.hyp2f1 method in the final line of your error trace is a lifetimes dependency expecting to receive numpy arrays as inputs. When using any of the lifetimes modeling methods, it is important to always use a df['COL_NAME'].values syntax in all of the arguments, otherwise hyp2f1 will receive a sliced-up Pandas dataframe and create the unstable behavior you are seeing.

Unfortunately, in the case of the lifetimes.GammaGammaFitter.customer_lifetime_value method, Pandas slices are being used in the internal operations. It's an easy fix, but the lifetimes project is no longer being actively maintained. Some other contributors and I are planning a Zoom meeting in a few weeks to discuss taking over development of this library. If you wish to contribute, please let us know in this issue link:

https://github.com/CamDavidsonPilon/lifetimes/issues/414

swasthikshettyhcl commented 3 months ago

@timmy-ops did you find any solution for this? [error] Cannot apply ufunc <ufunc 'hyp2f1'> to mixed DataFrame and Series input