tirthajyoti / Machine-Learning-with-Python

Practice and tutorial-style notebooks covering wide variety of machine learning techniques
https://machine-learning-with-python.readthedocs.io/en/latest/
BSD 2-Clause "Simplified" License
3.12k stars 1.8k forks source link

Statistically significant function in regression model #16

Open MatthiVH opened 4 years ago

MatthiVH commented 4 years ago

Hi,

I'm wondering what the yesno-fuction does in the following notebook: https://github.com/tirthajyoti/Machine-Learning-with-Python/blob/master/Regression/Regression_Diagnostics.ipynb

def yes_no(b): if b: return 'Yes' else: return 'No'

It should decide whether a parameter is significantly important or not for the model? Where does the b refer to and what's the threshold for it to decide it's not statistically significant?

I usually look at the p-values in the statsmodels-ols table and when they fall below 0.05, they are significant, but in this notebook something else seems to be happening and I'm wondering if you could elaborate a bit on it (What is b?, how is it calculated?, what's the b's threshold? How to change the threshold from 0.01 to 0.05?) When the p-value in the ols-table is above 0.05, but the yes_no-function decides it's significant, what should I do (leave the parameter out or not)?

Kind regards, Matthias

tirthajyoti commented 3 years ago

Strangely, I don't remember its context now but I agree with your simple threshold-based approach. Would you like to create a PR by updating the function by rewriting it with the threshold parameter?