ijyliu / ECMA-31330-Project

Econometrics and Machine Learning Group Project
2 stars 1 forks source link

Measurement Error Idea #12

Closed ijyliu closed 3 years ago

ijyliu commented 3 years ago

We aim to put forward a new approach to handling measurement error in regression and deal with attenuation bias. We will use a factor model/method to extract a new explanatory/independent variable, rather than take an instrumental variables approach, then proceed with regressing a dependent variable on this extracted variable.

This will be mostly a theory based project. The main focus will be a Monte Carlo simulation showing good performance, but after that we will discuss or implement an application.

Prior paper: https://warwick.ac.uk/fac/soc/economics/staff/knagasawa/PartialEffects.pdf

nicomarto commented 3 years ago

Main Idea:

Propose PCA/Factors models as a way to tackle the attenuation bias originated by variables measured with classical errors, as an alternative to the usage of IV as widely applied in the literature.

Stage 1: Theoretical formulation and derivation of the proposed estimator and its properties

Stage2: Simulation. 1) We define a certain DGP: we take our 'x' variable and construct 'y' according to the value of the coefficients that we determine. We know the real DGP.

2) We create a bunch of proxies for 'x', by adding different random shocks of mean zero and different variance to our variable 'x'. We can show by Monte Carlo simulation that regressing 'y' on any of the proxies will give us an biased estimators.

3) We take our bunch of proxies an apply PCA/Factor models to 'extract' (estimate) 'x', and then simulate OLS of 'y' in our extracted 'x'. Hopefully we show how the bias goes away. We could do different simulations comparing the amount of proxies we used in the PCA/Factor models and hopefully find (as I believe) that the more variables you have for 'x', the less bias we have

Stage 3: Having shown theoretically and fictionally that our idea works, we do an empirical approach resting in the foundations we built in S1 and S2 so we are not doing random regressions

ijyliu commented 3 years ago

See section 4.2: https://www-annualreviews-org.proxy.uchicago.edu/doi/pdf/10.1146%2Fannurev-economics-080315-015058

Crazy people (middle of p.354):

In fact, it has recently been suggested that surveys should be designed to elicit multiple measurements that may be mismeasured rather than attempting to gather exact data (Browning & Crossley 2009).

ijyliu commented 3 years ago

Straightforward explanation though this webpage doesn't really add anything new: https://advstats.psychstat.org/book/factor/index.php#factor-analysis

Also another quick explainer but idk if it's clear: http://web.pdx.edu/~newsomj/semclass/ho_latent.pdf

ijyliu commented 3 years ago

Measurement error regression models are factor analysis models, the latent ‘correct’ regressors are the factors.

Interesting take there

https://www.sciencedirect.com/science/article/pii/0304407694016763

ijyliu commented 3 years ago

Concern about measurement error in the latent variables/factors themselves: https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4787301/

ijyliu commented 3 years ago

Here's a clear problem I'm coming across applying this: the units become impossible to interpret when we use a factor

nicomarto commented 3 years ago
  1. For simplicity, I think is better to have only one regressor in which we will add errors

Also, I think we have to make more "errors" in order to have the set of variables that are miss-measuring the real variables:

https://github.com/ijyliu/ECMA-31330-Project/blob/017dea750638acf37419673ea176b24d4ed42ba1/Source/Factors_and_Measurement_Error/Simulate_DGP.py#L41

  1. I would choose a rho lower than .5. We do not need to assume multicollinearity in our setting, since that is not the problem we want to address in this project.

Also, I would pick different variances to construct the set of missmeasured regressor to make it more interesting

https://github.com/ijyliu/ECMA-31330-Project/blob/017dea750638acf37419673ea176b24d4ed42ba1/Source/Factors_and_Measurement_Error/Simulate_DGP.py#L70

ijyliu commented 3 years ago

Also, I think we have to make more "errors" in order to have the set of variables that are miss-measuring the real variables

I mean, we can leave it in the code and just set x_measurement_errors (the variance) to 0 for those coefficients? Kind of seems more realistic to have multiple mismeasured variables to me, but either way is probably fine.

Good point about the variances and the rhos, we should definitely adjust that, I just printed that DGP setting to make sure the function worked.

nicomarto commented 3 years ago

I mean, we can leave it in the code and just set x_measurement_errors (the variance) to 0 for those coefficients? Kind of seems more realistic to have multiple mismeasured variables to me, but either way is probably fine.

Sounds good. Lets leave it in the code and then when we iterate with Nadav we adjust according to how much they want as to develop

ijyliu commented 3 years ago

Make case for prediction only?

ijyliu commented 3 years ago

Switched to measurement error in the covariates but not variable of interest; can make stronger claims than just prediction.