Closed tomicapretto closed 1 week ago
hi,
I want to contribute to pymc-marketing and this seemed like a good first issue :)
quick question: by changing the formula for logistic_saturation
, I assume this also means changing all of the occurrences of lambda whenever logistic_saturation
is called (ex.: in saturation_function
), including the examples notebooks, in addition to the tests for logistic_saturation
.
is that correct?
hi, I want to contribute to pymc-marketing and this seemed like a good first issue :) quick question: by changing the formula for
logistic_saturation
, I assume this also means changing all of the occurrences of lambda wheneverlogistic_saturation
is called (ex.: insaturation_function
), including the examples notebooks, in addition to the tests forlogistic_saturation
. is that correct?
Hi @arthurmello. Thank you for your interest! We'd appreciate your contribution 🚀
It may be good to have the two different functions like with the two parameterization of the Tanh. Then we can keep the old behavior as well but also support this new parameterization. Or have an additional argument which signifies the parameterization type. Not too sure how it would look. I think keeping previous behavior might be good as the logistic saturation is primary from DelayedSaturatedMMM class.
How does that sound?
Also had the thought... What if we scale by constant ln(3) in numerator and denominator such that f(lam) = 0.5 Function at lam being exactly half saturation point
I.e. $$f(x, \lambda) = \frac{1 - e^{-\frac{x\ln(3)}{\lambda}}}{1 + e^{-\frac{x\ln(3)}{\lambda}}}$$
Thoughts on this @juanitorduz
So we could have logistic_saturation
and inverse_scaled_logistic_saturation
(and thus the corresponding LogisticSaturation
and InverseScaledLogisticSaturation
).
In logistic_saturation
nothing changes, while in InverseScaledLogisticSaturation
, we inverse lambda and multiply it by a scaling parameter (ln(3) by default).
What do you think?
So we could have
logistic_saturation
andinverse_scaled_logistic_saturation
(and thus the correspondingLogisticSaturation
andInverseScaledLogisticSaturation
). Inlogistic_saturation
nothing changes, while inInverseScaledLogisticSaturation
, we inverse lambda and multiply it by a scaling parameter (ln(3) by default). What do you think?
Yeah, I think that seems pretty good! We could use the logistic_saturation
function in the new function.
Let us know if you need any support with PR. Tag Juan and myself once you get started!
thanks @wd60622 and @juanitorduz , I've started working on it today!
I have followed all the steps from here and here to set up my local dev environment with conda, but even before making any changes, multiple tests fail.
First error comes from make check_lint
: apparently (error: `ruff <path>` has been removed. Use `ruff check <path>` instead.)
. If I run ruff check .
alone, it works. Should Makefile
say ruff check .
or am I missing something in my env? I have ruff==0.5.1
, which is compatible with environment.yml
(ruff>=0.1.4
).
Second issue comes from running mypy .
: I get 512 errors, such as:
pymc_marketing/clv/models/beta_geo.py:427: error: Missing return statement [return]
Is it related to what is being addressed on PR #823 and thus expected?
thanks @wd60622 and @juanitorduz , I've started working on it today!
I have followed all the steps from here and here to set up my local dev environment with conda, but even before making any changes, multiple tests fail.
First error comes from
make check_lint
: apparently(error: `ruff <path>` has been removed. Use `ruff check <path>` instead.)
. If I runruff check .
alone, it works. ShouldMakefile
sayruff check .
or am I missing something in my env? I haveruff==0.5.1
, which is compatible withenvironment.yml
(ruff>=0.1.4
).Second issue comes from running
mypy .
: I get 512 errors, such as:pymc_marketing/clv/models/beta_geo.py:427: error: Missing return statement [return]
Is it related to what is being addressed on PR #823 and thus expected?
I'd say to push up the changes in a PR then see what the CI/CD says about the changes. We might need to change those instructions but we will address those in separate PR
EDIT: Created a new issue for this: #825
@wd60622 I've created a draft PR for this: #827
The
logistic_saturation
function maps the reals to (-1, 1). If we use values in $[0, \infty)$, the output is in $[0, 1)$. Currently, the parametrization oflogistic_saturation
is:$$ f(x, \lambda) = \frac{1 - e^{-x\lambda}}{1 + e^{-x\lambda}} $$
$\lambda$ is called the "saturation parameter". In Bayesian Media Mix Modeling using PyMC3, for Fun and Profit it says the parameter represents the "half-saturation point" but I think it's not completely right.
If we evaluate $f$ at $1 / \lambda$ the result is $\approx 0.462$, meaning that at $1 / \lambda$ you reach almost half saturation. So $\lambda$ would be something close to the inverse of the half saturation point.
Thus, I propose to use
$$ f(x, \lambda) = \frac{1 - e^{-\frac{x}{\lambda}}}{1 + e^{-\frac{x}{\lambda}}} $$
and $\lambda$ could be interpreted as "almost half-saturation" point.