google / lightweight_mmm

LightweightMMM 🦇 is a lightweight Bayesian Marketing Mix Modeling (MMM) library that allows users to easily train MMMs and obtain channel attribution information.
https://lightweight-mmm.readthedocs.io/en/latest/index.html
Apache License 2.0
903 stars 194 forks source link

find_optimal_budgets current function value returning nan #42

Closed virithavanama closed 2 years ago

virithavanama commented 2 years ago

Hi Team,

  1. find_optimal_budgets current function value returning nan
  2. previous and. optimal budget allocation values are always equal how much i change the values and range
  3. one of the channel previous budget is returning 0 even where there is budget present
  4. Media contribution is more for channel 1 but where as ROI is more for channel 0

Thanks in advance

pabloduque0 commented 2 years ago

Hello! I might need a bit more information, let me ask a few questions to see what is going on:

It could be that there is some misalignment between the budget provided, the prices and the bounds. Could you share what values you have provided for those arguments? Or check they are in sync with one another.

virithavanama commented 2 years ago

Hi, Optimization just gave warning as shown in attached image and output is "Singular matrix E in LSQ subproblem (Exit mode 5)". I'm passing media cost in media_data, all the values i have passed are in the attached screenshot.

image image
pabloduque0 commented 2 years ago

Can you confirm the model converged? (rhats in the print_summary are less than 1.1) Just to rule that out first.

virithavanama commented 2 years ago

Yes r_hat values are less than 1.1

image
pabloduque0 commented 2 years ago

Apologies for the late response. Thanks for checking that!

Generally that error would be (at least in this scenario) related to the constraint, in this case the budget constraint.

Can you show what the response curves look like?

virithavanama commented 2 years ago
image image

I have attached the response curves. But I have few questions:

  1. "extra_features", i'm passing impressions into to this. Should i pass the sum of all the channels impressions as a single array or each channel separately?
  2. If we want to "find_optimal_budgets" for the future and don't have the extra_features data(bcoz its future), we can still get good estimation of budget or it is dependent on extra features?
  3. we have "bounds_lower_pct" and "bounds_upper_pct" for all the channels right, but can we put limitation per channel?
  4. When changing "bounds_lower_pct", its not changing anything
  5. One of the channels optimal budget is becoming 0 how to avoid that?
pabloduque0 commented 2 years ago
  1. No, extra features should not be media variables, it is meant to be any NON media variables (promotions, price index, ..)
  2. For the optimization if we dont have the future values of those variables you can just pass a historic average.
  3. Yes, you can pass one value per channel, it accepts arrays.
  4. That is probably because the bound is already 0.
  5. That is probably because the model has not learned much saturation from the data for the other channel, and therefore it is still getting linear ROI such channel and tells you to allocate more budget there. You can try the hill model if you havent which tends to saturate better or modify the priors for saturation parameters.
sv09 commented 2 years ago

Hi Team,

I'm facing the same issue when running 'find_optimal_budgets'. Screenshots from the run:

Input -

input

Run -

run

Model Summary -

model_summary

Response Curves -

response_curves

Thank you!

virithavanama commented 2 years ago

Hi Pablo, Thanks for the response. I'm still facing the same issue. "/content/notebooks/scipy/optimize/_numdiff.py:579: RuntimeWarning: invalid value encountered in true_divide J_transposed[i] = df / dx Singular matrix E in LSQ subproblem (Exit mode 5) Current function value: nan" Have made all the changes you have suggested and r_hat is less than 1.1 Actually if i use data windowing for a specific range it is giving this error but when i take complete data this error is occurring. What is the reason for this error and what are the measures to be taken to avoid this?

pabloduque0 commented 2 years ago

Can you further ellaborate on this sentence:

Actually if i use data windowing for a specific range it is giving this error but when i take complete data this error is occurring.
virithavanama commented 2 years ago

Basically if a channel has 0 as the media cost for many weeks, we are getting this error. If we select a period with relatively less 0s its working

pabloduque0 commented 2 years ago

Could you show me the output of the following calls:

from lightweight_mmm import optimize_media
starting_values = optimize_media._generate_starting_values(
      n_time_periods=n_time_periods,
      media=media_mix_model.media,
      media_scaler=media_scaler,
      budget=budget)

and


bounds = optimize_media._get_lower_and_upper_bounds(
      media=media_mix_model.media,
      n_time_periods=n_time_periods,
      lower_pct=0.2,
      upper_pct=0.2,
      media_scaler=media_scaler)

Since the media optimisation can be sensitive to user input I might provide so more debug options in the future.

pabloduque0 commented 2 years ago

Closing due to inactivity, feel free to re-open if there are any questions/issues left.