ijyliu / ECMA-31330-Project

Econometrics and Machine Learning Group Project
2 stars 1 forks source link

Centering/Standardizing #30

Closed ijyliu closed 3 years ago

ijyliu commented 3 years ago

Currently subtracting mean and dividing by sd, but maybe only subtract mean

ijyliu commented 3 years ago

Do this for all variables, or for covariates only?

ijyliu commented 3 years ago

@marionoro and I discussed this earlier.

I looked at the simulation code and it seemed like it was subtracting the mean and dividing by the SD.

But I don't remember: was it on the dependent variable and the variable interest, or just the covariates? I guess it probably doesn't matter since things were probably mean zero and variance one.

implications of this matter for #26 or whether or not to include intercept

paul-opheim commented 3 years ago

I only standardize the covariate measurements. I do not standardized the y or x variables. How does this matter for whether or not to include the intercept?

ijyliu commented 3 years ago

There's no need to include an intercept if you are standardizing I think (intercept's roughly the mean).

ijyliu commented 3 years ago

So y and x are mean 0 and variance 1 in the simulations right? In that case we can close this because standardizing wouldn't do anything

ijyliu commented 3 years ago

I think technically we would want the simulation code to match the empirical situation pretty exactly in terms of standardizing, which is the case if y and x are mean 0 and var 1

paul-opheim commented 3 years ago

So y and x are mean 0 and variance 1 in the simulations right? In that case we can close this because standardizing wouldn't do anything

I don't think that this is right. Since y = beta1 x + beta2 z, y will have a mean of 0 but it will have a variance that is greater than 1 (it's the sum of two normal distributions where at least one has a variance of 1 and the other has a strictly positive variance). x would have a variance of around 1, although it probably wouldn't be at exactly 1.

However, I don't think that we would need an intercept given that the mean of y should be around 0?

ijyliu commented 3 years ago

Ugh. Yeah, we won’t need an intercept. But we need to figure out if we should standardize y and x or not.

On Mon, May 31, 2021 at 1:30 PM marionoro @.***> wrote:

So y and x are mean 0 and variance 1 in the simulations right? In that case we can close this because standardizing wouldn't do anything

I don't think that this is right. Since y = beta1 x + beta2 z, y will have a mean of 0 but it will have a variance that is greater than 1 (it's the sum of two normal distributions where at least one has a variance of 1 and the other has a strictly positive variance). x would have a variance of around 1, although it probably wouldn't be at exactly 1.

However, I don't think that we would need an intercept given that the mean of y should be around 0?

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/ijyliu/ECMA-31330-Project/issues/30#issuecomment-851609163, or unsubscribe https://github.com/notifications/unsubscribe-auth/AQCGE4M3CL45VOTYGD7AHELTQPBSLANCNFSM44XDKZAA .

ijyliu commented 3 years ago

@nicomarto @marionoro so here's a bit of ickiness: we have to pick between standardizing the covariates for just the pca regression, or standardizing them for all the regressions. I feel like the normal thing to do would be to standardize them just for the pca. But then that doesn't really seem like it's setting up a valid comparison between the methods. On the other hand, if we standardize them for everything it feels kind of arbitrary to be have some variables standardized and others not in, say, the regular OLS regression with all the covariates and the variable of interest stuck in.

I looked at wikipedia again and standardizing Y (or at least demeaning it) seems valid

paul-opheim commented 3 years ago

I think we should standardize the covariates (and only the covariates) for every specification. To me, it seems fair to do so since these regressions are making a distinction between the variable of interest and the other things that we are controlling for, and so it seems valid to treat those things differently from each other. That way seems cleaner to me.

ijyliu commented 3 years ago

@marionoro I have to do work stuff now but if you're free you can add an intercept back in the sims for consistency. Then I think the sims are good to run again

paul-opheim commented 3 years ago

I re-ran the simulations with intercepts and edited the charts to contain that new data.