rickecon / StructEst_W20

MACS 40200 (Winter 2020): Structural Estimation
55 stars 65 forks source link

[PS2_Q2] Linear regression in Python #8

Open frederickz opened 4 years ago

frederickz commented 4 years ago

I am pretty interested in adapting regression method into the initial guesses. Is sklean a good package? Thanks.

rickecon commented 4 years ago

@zhuyuming96 . Python has three primary packages for running linear regressions, other than doing the linear algebra approach of inv(X'X)(X'Y). You can find examples of how to use the first two methods in QuantEcon's linear regression notebook section on Simple Linear Regression and on Endogeneity. For the third method, I have some examples in my classification (discrete choice) notebook from MACS 30150.

  1. The statsmodels.api package is probably the most similar to STATA. See Simple Linear Regression section of QuantEcon notebook on "Linear Regression in Python".
    
    import statsmodels.api as sm

Create object that sets up the regression

reg1 = sm.OLS(endog=df1['logpgp95'], exog=df1[['const', 'avexpr']], missing='drop')

Actually estimate the coefficients

results = reg1.fit()

Print STATA-like regression output

print(results.summary())


2. Similar to `statsmodels.api` is the `linearmodels.iv` package for instrumental variables models. See [Endogeneity](https://python.quantecon.org/ols.html#Endogeneity) section of QuantEcon notebook on "Linear Regression in Python".
```python
from linearmodels.iv import IV2SLS

iv = IV2SLS(dependent=df4['logpgp95'],
            exog=df4['const'],
            endog=df4['avexpr'],
            instruments=df4['logem4']).fit(cov_type='unadjusted')

print(iv.summary)
  1. scikit-learn also has some regression commands, but it is more about point estimates and prediction than about standard errors. It is harder to get the standard errors and do hypothesis testing. See classification (discrete choice) notebook from Dr. Evans' MACS 30150 class.
    
    from sklearn.linear_model import LinearRegression
    from sklearn.linear_model import LogisticRegression

LinReg = LinearRegression() LinReg.fit(X_train, y_train) y_LinReg_pred = LinReg.predict(X_test)

LogReg = LogisticRegression() LogReg.fit(X_train, y_train) y__LogReg_pred = LogReg.predict(X_test)

frederickz commented 4 years ago

Thank you so much, especially all the resources you provide! These really help.