Closed jpansnap closed 7 months ago
Just took a quick look at this: could the same be achieved using only the .fillna(0)
element and skipping the skipna=True
and .astype(float)
?
Just took a quick look at this: could the same be achieved using only the
.fillna(0)
element and skipping theskipna=True
and.astype(float)
?
Just .fillna(0)
will give the division by zero error for example the first record is missing - that's why used .asype(float)
Adding skipna=True
just trying to be more explicit and also in case some weird cased that filling na didn't work
We had a discussion this morning, just want to provide some of our discussion here:
What do you think?
We had a discussion this morning, just want to provide some of our discussion here:
- it seems to me fillna(0) should be a decision by the user. Should we consider to provide the option to the user what is the value they want to fill
- If there is NA in the data, shouldn't the user handle it at the beginning of the model instead of at the plotting stage? Maybe we should just throw an error and suggest user to handle NA by themselves.
- To the very least, we should let user know their data has NA and causalml filled it.
What do you think?
This is a quick fix as the main issue right now for this function which will just return empty values - not only plotting but also the AUUC will be off.
For 2. For my use case, I didn't use CausalML for modeling but just plotting - I would assume we want to provide support for individual module as well. For 1 and 3 - we can modify the code to be more robust please let me know which way we'd prefer to.
We had a discussion this morning, just want to provide some of our discussion here:
- it seems to me fillna(0) should be a decision by the user. Should we consider to provide the option to the user what is the value they want to fill
- If there is NA in the data, shouldn't the user handle it at the beginning of the model instead of at the plotting stage? Maybe we should just throw an error and suggest user to handle NA by themselves.
- To the very least, we should let user know their data has NA and causalml filled it.
What do you think?
This is a quick fix as the main issue right now for this function which will just return empty values - not only plotting but also the AUUC will be off.
For 2. For my use case, I didn't use CausalML for modeling but just plotting - I would assume we want to provide support for individual module as well. For 1 and 3 - we can modify the code to be more robust please let me know which way we'd prefer to.
Got it, how about we go with option 1(provide the option to the user)? It is more robust and gives users more flexibility.
This is a quick fix as the main issue right now for this function which will just return empty values - not only plotting but also the AUUC will be off.
For 2. For my use case, I didn't use CausalML for modeling but just plotting - I would assume we want to provide support for individual module as well. For 1 and 3 - we can modify the code to be more robust please let me know which way we'd prefer to.
What about throwing an error with a message informing that there's NaN in the input instead of returning null? It's not a good practice to impute NaN silently. Another option is that we can add an input argument for users to specify whether to impute NaN or not with the default being no imputation.
This is a quick fix as the main issue right now for this function which will just return empty values - not only plotting but also the AUUC will be off. For 2. For my use case, I didn't use CausalML for modeling but just plotting - I would assume we want to provide support for individual module as well. For 1 and 3 - we can modify the code to be more robust please let me know which way we'd prefer to.
What about throwing an error with a message informing that there's NaN in the input instead of returning null? It's not a good practice to impute NaN silently. Another option is that we can add an input argument for users to specify whether to impute NaN or not with the default being no imputation.
Yeah agree, we could throw the error and add the input argument. Also saw the comment in https://github.com/uber/causalml/pull/720
Proposed changes
Noticed that if there are missing values for the outcome column when calculate cumulative lift, the current
.cumsum()
function default asskipna=False
which will return NaN when there are missing values as a results the calculation of lift could be offPropose change to
.fillna(0).cumsum(skipna=True).astype(float)
(change to float to avoid the division by zero error as well)Types of changes
What types of changes does your code introduce to CausalML? Put an
x
in the boxes that applyChecklist
Put an
x
in the boxes that apply. You can also fill these out after creating the PR. If you're unsure about any of them, don't hesitate to ask. We're here to help! This is simply a reminder of what we are going to look for before merging your code.Further comments
If this is a relatively large or complex change, kick off the discussion by explaining why you chose the solution you did and what alternatives you considered, etc. This PR template is adopted from appium.