rsquaredacademy / rfm

Tools for Customer Segmentation using RFM Analysis
https://rfm.rsquaredacademy.com/
Other
59 stars 28 forks source link

Frequency bin class #57

Closed gfagherazzi closed 4 years ago

gfagherazzi commented 5 years ago

Hi, I need to modify the way RFM bins are created. For frequency, for example, the 2 class is too large in my case. Thank you

aravindhebbali commented 5 years ago

Hi @gfagherazzi, Is it possible to share some more information? It will help me to understand why 2 class is large.

Thank you

gfagherazzi commented 5 years ago

Hi @aravindhebbali , yes sure Please have a look at my heatmap.

image

aravindhebbali commented 5 years ago

This usually happens when there is not much variation in the frequency data i.e. the transaction count for the customers. In your data set, there are customers either with very low or very high transaction count.

gfagherazzi commented 5 years ago

Ok I see, is it possible to read somewhere the RFM threshold for each class? For instance, in this case, it should be very useful to know how the frequency class is split between the 5 bins. Thank you G.

aravindhebbali commented 5 years ago

Ok, I will create a new branch in which the output will include the thresholds used. Which of the below function are you using to create the RFM results?

I will create a variant of the function so that it will return the thresholds.

gfagherazzi commented 5 years ago

Wow super! I'm using the rfm_table_order() function

aravindhebbali commented 5 years ago

I have created a new branch threshold. Install the package from this branch and you should be able to access the thresholds.

devtools::install_github("rsquaredacademy/rfm@threshold")

analysis_date <- lubridate::as_date('2006-12-31', tz = 'UTC')

result        <- rfm_table_order(rfm_data_orders, 
                                 customer_id, 
                                 order_date, 
                                 revenue, 
                                 analysis_date)

result$threshold
# A tibble: 5 x 6
  recency_lower recency_upper frequency_lower frequency_upper monetary_lower monetary_upper
          <dbl>         <dbl>           <dbl>           <dbl>          <dbl>          <dbl>
1            1           115                1               4            12            256.
2          115           181                4               5           256.           382 
3          181           297.               5               6           382            506.
4          297.          482                6               8           506.           666 
5          482           977                8              15           666           1489 

The intervals are created in the below style:

Left-closed, right-open: [ a , b ) = { x ∣ a ≤ x < b }

In the above example, the bins are assigned as follows:

Bins Frequency/Transaction Count
   1                       1 - 3       
   2                           4
   3                           5
   4                       6 - 7
   5                          >7

Hope this helps.

gfagherazzi commented 5 years ago

Super useful Thank you so much!

G.

gfagherazzi commented 5 years ago

Sorry to disturb you again, but if think it should be really useful a method to override values in rfm_result tibble bins.

Something like segments modeling

recency_lower <- c(4, 2, 3, 4, 3, 2, 2, 1, 1, 1) recency_upper <- c(5, 4, 5, 5, 4, 3, 3, 2, 1, 2) ....

Thank you G.

aravindhebbali commented 5 years ago

You mean you want to be able to specify the intervals manually?

gfagherazzi commented 5 years ago

Yes exactly

aravindhebbali commented 5 years ago

58

Will experiment with this enhancement in the threshold branch sometime this week and will keep you posted.