ijyliu / ECMA-31330-Project

Econometrics and Machine Learning Group Project
2 stars 1 forks source link

Add something on curse of dimensionality as a problem for iv and exponential transformations as a problem for averaging and all measurements to theory section #92

Closed ijyliu closed 3 years ago

ijyliu commented 3 years ago

this is mentioned in the intro but i don't think it was discussed

paul-opheim commented 3 years ago

Oh, just realized that I didn't add this to the simulation section. I'm not sure if I would know exactly what to say about those things?

ijyliu commented 3 years ago

I think it's more of a theory thing than a simulation thing. In the simulation section you can just cite these two concepts before presenting the table that shows how pca does well

ijyliu commented 3 years ago

@marionoro the N/p thing you discuss is the curse of dimensionality. and what you have on the exponential is fine too

paul-opheim commented 3 years ago

I just added a sentence that explicitly says "The Curse of Dimensionality"

ijyliu commented 3 years ago

notes: can probably get something on curse of dimensionality/too many instruments from class or online. I guess this just looking for something on the variance of the iv estimator when p and n vary.

for exponential transformations, maybe a toy example? idk

ijyliu commented 3 years ago

doing the exponential just gives more variance to the mismeasurements. but pca does a good job summarizing this variance? while somehow the average does not

ijyliu commented 3 years ago

PCA's optimality of approximation must be at work here, we are introducing nonlinearity with the transform but pca is still the best way to model it linearly, better than the average

This might be what the z or mismeasured covariates look like. pca will be well rotated

image

paul-opheim commented 3 years ago

Tbh, I'm not sure if it's tractable to come up with theoretical reasons for why the exponential transformation makes taking the average perform worse. Isn't the simple reason for that trend that without any scaling taking the average will roughly approximate the true value (since it's averaging white noise errors away), but with the transformation the average will no longer be centered on the true value because the scales are so different?

ijyliu commented 3 years ago

I guess if it's e^z, in a mean 0 environemnt when you throw in exponentials you will be adding strictly postitive items. So yes?

But then if PCA rearranges weightings to explain more variance, won't it be even more wrong

ijyliu commented 3 years ago

No, wait we are rescaling. But I forget if that is before or after the transform

ijyliu commented 3 years ago

Transform is before rescaling.

paul-opheim commented 3 years ago

Yup we transform then rescale then PCA

ijyliu commented 3 years ago

So I guess then it's probably not something to do with the mean, but with the variance of the mismeasured items

paul-opheim commented 3 years ago

I figured it had something to do with this:

image

Taking the exponential of the distribution totally warps the distribution, even if you re-scaled it again later. z here is nothing like x.

ijyliu commented 3 years ago

So half the covariates are drawn from distributions like x and half from z

paul-opheim commented 3 years ago

Exactly. Where 0 is the true factor I guess?

paul-opheim commented 3 years ago

image

Although x+z has a lower "MAE" and "MSE" than x+x, so I'm not sure I really get the mechanism by which this makes taking the average less effective.

ijyliu commented 3 years ago

In z you are more likely to draw a negative value right

ijyliu commented 3 years ago

Because it's mean not median centered

paul-opheim commented 3 years ago

Correct

paul-opheim commented 3 years ago

More likely to draw a negative but because the standard deviation of e**x is so large, the normalization crams more of the density of the distribution to be closer to 0 (with then a corresponding fatter tail to the right of the distribution than there is in a normal distribution).

ijyliu commented 3 years ago

Ugh still not clear to me it's off center when you do a few draws from z and a few from x and average across them

paul-opheim commented 3 years ago

What do you mean? If you add a centered distribution and an off-center one, the sum of those distributions will still be off center (although more centered than just the off-center distribution alone). Example:

image

paul-opheim commented 3 years ago

image

ijyliu commented 3 years ago

Yeah but what's the mean of x+z/2

Or is it the median that matters?

paul-opheim commented 3 years ago

image

paul-opheim commented 3 years ago

Yeah lol I'm getting confused haha. If it's the median that matters, then it makes sense why the transformations decrease performance. I don't quite get why the median would matter though.

paul-opheim commented 3 years ago

Here are the deciles of each distribution:

image

paul-opheim commented 3 years ago

image

ijyliu commented 3 years ago

Notes: doubtful we will be able to develop this theoretically. So, we will try to provide just a little justification within the simulation section.

What we have on curse of dimensionality in the sim section is fine.

However, we should aim to add a sentence on why transforming the variables makes averaging perform poorly.

@marionoro suggested:

We would expect PCR to outperform taking the average of all covariate measurements under this multiple-scale measurement framework as these transformations would warp the measurement error distributions around the true covariate in a way that a simple average would not be able to capture (whereas PCA would be better able to deal with this issue)

@nicomarto will give us his thoughts on this

I did some diagnostics but the results were frustrating:https://github.com/ijyliu/ECMA-31330-Project/blob/f8e33fb0a608738c710cf73b9c4ff50337bb08ae/Source/Simulations/exponential_experiments.ipynb

nicomarto commented 3 years ago

ok so what I suspect is it has to do something with this: since PCA tries to max the variance of the linear combinations of X, then clearly things changes when we take exponetial. If all is linear, smaple average is basically the linear combination that maximises the variance when all lineas are scaling the same. But when we take exponentials, exp(x+e) and exp(x+u) have different difference than x+e and x+u, so the "amount" of variance that we explain with those variables is different. Under PCA and sample average, x+u and x+e have basically the same weight in PCA and sample av when maximasing the variance, but the weights for exp(x+u) and exp(x+e) are way different

ijyliu commented 3 years ago

Average is equal weight for each measurement. PCA is not and would be more able to adjust weights to capture more variance. It's still tricky though because everything is rescaled so I guess idk why the weights need to be not equal

nicomarto commented 3 years ago

Yeah, the thing is that since all our missmeasured var are of the form x+e, so basically the best linear combination is very close to the average (in fact with p goes to infinity the average converge to the observation). Then the difference with PCA and av when we are no using exp(x+e) comes from the variance of the estimator, since I believe PCA would slightly change the weights to favor does where e is close to zero since they are closer to the true observation.

This does not happen when we take exp(x+e), since now clearly there are substantial differences between the optimal linear combination and the average

nicomarto commented 3 years ago

btw waht paul wrote sounds amazing

ijyliu commented 3 years ago

hmm i guess pca does always have lower sd in the table. but it seems like not by a lot. especially relative to the bias

nicomarto commented 3 years ago

I suspected that when doing the theory thing, and the simulation seems to support it. Intuitively, what PCA should be making is trying to keep all the noise behind. Then, the estimation of the true variable throuhg PCA should be more efficient that when doing sample average, and so the PCA "transfers" less variance to the estimated coefficientes than the sample average

ijyliu commented 3 years ago

We would expect PCR to outperform taking the average of all covariate measurements under this multiple-scale measurement framework as these transformations would warp the measurement error distributions around the true covariate in a way that a simple average would not be able to capture (whereas PCA would be better able to deal with this issue)

alright. so idk if we are better off putting this in the paper or modifying it or leaving things as is

nicomarto commented 3 years ago

what does @marionoro say

ijyliu commented 3 years ago

Here's a modification to consider

We would expect PCR to outperform taking the average of all covariate measurements under this multiple-scale measurement framework as these transformations would warp the measurement error distributions around the true covariate in a way that a simple average would not be able to capture (whereas PCA would be capable of optimally adjusting relevant weightings)

paul-opheim commented 3 years ago

I don't really understand the discussion so I defer to y'all haha

nicomarto commented 3 years ago

It sounds nice to mee @ijyliu

ijyliu commented 3 years ago

We would expect PCR to outperform taking the average of all covariate measurements under this multiple-scale measurement framework as these transformations would warp some of the measurement error distributions around the true covariate in a way that a simple average would not be able to capture (whereas PCA would be capable of optimally adjusting relevant weightings)

warp some of the distributions

nicomarto commented 3 years ago

wait I got lost lol

paul-opheim commented 3 years ago

Wait so are we done here lol?

nicomarto commented 3 years ago

idk lol

ijyliu commented 3 years ago

We would expect PCR to outperform taking the average of all covariate measurements under this multiple-scale measurement framework as these transformations would warp some of the measurement error distributions around the true covariate in a way that a simple average would not be able to capture (whereas PCA would be capable of optimally adjusting relevant weightings)

include this?

paul-opheim commented 3 years ago

That sounds good to me!

nicomarto commented 3 years ago

same !

ijyliu commented 3 years ago

image

ok here's the full paragraph

if good go ahead and close

paul-opheim commented 3 years ago

Looks good to me.