Closed ColtAllen closed 8 months ago
@ColtAllen is this one done, or that extra column still needed?
@ColtAllen is this one done, or that extra column still needed?
I plan to open a PR for an rfm_segmentation
utility in the near future. We can just apply the column transformation within that function, because I'd rather not add an unnecessary column to the rfm_summary
output.
The
clv_summary
function is the primary data preprocessing step forBetaGeoModel
,ParetoNBDModel
, andGammaGammaModel
. It has several shortcomings:customer_id
column, but this function is not creating one.pandas.sort_values
is being called internally by this function, which can cause memory crashes with large datasets (say, >10M rows). I'm not aware of a viable workaround for this, but aUserWarning
can be added and/or asort_values
parameter to skip this operation if sorting is already being applied on the DB side.include_first_transactions
parameter must be added. This can be adapted fromlifetimes
recency
than what is used for modeling. To reduce confusion, let's just add an additional column for this to the output DF.rfm_summary
.