pycaret / pycaret

An open-source, low-code machine learning library in Python
https://www.pycaret.org
MIT License
8.83k stars 1.76k forks source link

[DOC]: What does imputation_type="simple" do? #3226

Closed jameshfisher closed 1 year ago

jameshfisher commented 1 year ago

pycaret version checks

Location of the documentation

I'm reading the regression API docs here, which say

imputation_type: str or None, default = ‘simple’

The type of imputation to use. Can be either ‘simple’ or ‘iterative’. If None, no imputation of missing values is performed.

Documentation problem

What does imputation_type="simple", the default, actually do?

I found a good description of imputation_type="iterative" in this unofficial article. I can also see that you can set the iterative imputer, which defaults to lightgbm.

But there is no description of what simple means or does.

Suggested fix for documentation

Describe this in the API docs somewhere, or link to something that describes it.

jameshfisher commented 1 year ago

I'm looking at https://github.com/pycaret/pycaret/blob/ff1a8c905ad722d9de4fd8edfe793794a25775c1/pycaret/internal/preprocess/preprocessor.py#L414-L466

Which suggests imputation_type="simple" means:

Do I have this right?

Yard1 commented 1 year ago

Not exactly. Simple imputation consists of filling the values with mean/median for numeric features and most frequent value/constant for categorical values. See https://scikit-learn.org/stable/modules/generated/sklearn.impute.SimpleImputer.html

Iterative imputation starts with simple imputation and then fits ML models on each column with missing values, using the values of other columns to predict what should be imputed. This is done several times, thus iterative imputation. Hope that helps - please close the issue if you do not have any more questions.

jameshfisher commented 1 year ago

That's helpful, thank you!