Closed vincentarelbundock closed 1 year ago
@LamAdr Do you need any clarification?
Hi @vincentarelbundock, I have a first attempt at step 1. I am not sure of the workflow. Should I push my local branch? I seem to lack the permission for that.
Very nice!
Normally, you need to fork the repo (a fork is an independent copy of the repo under your own account), push your local branch to your own fork, and then open a "Pull Request". The Github docs are pretty good, I think:
Chat-GPT will almost surely give you terminal commands that are very close to what you need for this. Super common operations...
Step 1:
build_plot()
functionmodel
andcondition
get_modeldata()
to extract the original data used to fit the model. Call itmodeldata
assert
to make sure that thecondition
argument conforms to required types, and return a helpful message if it doesn't.modeldata.columns
modeldata.columns
condition
is a list of strings, then useutils/get_variable_type()
to determine what type is the variable:condition
is numeric, take 100 equally-spaced points between the min and maxcondition
are numeric, take Tukey's 5-numbers: https://en.wikipedia.org/wiki/Five-number_summarymodeldata["variablename"].unique()
). If there are more than 10 unique values, useassert
to return an informative error to say that it is not supported.condition
is a dict, then we take the values supplied by the user explicitly instead of using our own summaries.datagrid()
function to create a data frame.Step 2: Pass
build_plot()
output topredictions()
model
and the data frame you created to thenewdata
argument in thepredictions
function. In principle, this should give you a nice data frame of predictions.Step 3: Pass the result to
seaborn
ormatplotlib
estimate
column of the data frame in Step 2 is the Y-axiscondition
determines the variable on the X-axisfind_response()
to theutils.py
file. This extracts the name of the dependent variable as a string. I think that instatsmodels
this is stored inmodel.exog_name
or some similar attribute.Step 4:
newdata
andby
argumentsThis is an alternative to
condition
. These arguments cannot be used at the same time, and we need to useassert
to raise informative errors if the user tries to do it.Read the
marginaleffects
for R documentation and vignette and code to figure this out. Should be pretty straightforward, but ask me if you can't figure it out after 30 minutes of work.Step 5: Add the other arguments.
This is mainly a question of passing additional arguments to the
predictions()
call we used in Step 2.Step 6: Repeat for
plot_slopes()
andplot_comparisons()
.The challenge here will be "Don't repeat yourself". How can we make the much of the code reusable for all three types of plots.
The best way to approach this is to do it for
plot_predictions()
. Then we can refactor the code to make it work for all three.