Closed ijyliu closed 3 years ago
Oh, just realized that I didn't add this to the simulation section. I'm not sure if I would know exactly what to say about those things?
I think it's more of a theory thing than a simulation thing. In the simulation section you can just cite these two concepts before presenting the table that shows how pca does well
@marionoro the N/p thing you discuss is the curse of dimensionality. and what you have on the exponential is fine too
I just added a sentence that explicitly says "The Curse of Dimensionality"
notes: can probably get something on curse of dimensionality/too many instruments from class or online. I guess this just looking for something on the variance of the iv estimator when p and n vary.
for exponential transformations, maybe a toy example? idk
doing the exponential just gives more variance to the mismeasurements. but pca does a good job summarizing this variance? while somehow the average does not
PCA's optimality of approximation must be at work here, we are introducing nonlinearity with the transform but pca is still the best way to model it linearly, better than the average
This might be what the z or mismeasured covariates look like. pca will be well rotated
Tbh, I'm not sure if it's tractable to come up with theoretical reasons for why the exponential transformation makes taking the average perform worse. Isn't the simple reason for that trend that without any scaling taking the average will roughly approximate the true value (since it's averaging white noise errors away), but with the transformation the average will no longer be centered on the true value because the scales are so different?
I guess if it's e^z, in a mean 0 environemnt when you throw in exponentials you will be adding strictly postitive items. So yes?
But then if PCA rearranges weightings to explain more variance, won't it be even more wrong
No, wait we are rescaling. But I forget if that is before or after the transform
Transform is before rescaling.
Yup we transform then rescale then PCA
So I guess then it's probably not something to do with the mean, but with the variance of the mismeasured items
I figured it had something to do with this:
Taking the exponential of the distribution totally warps the distribution, even if you re-scaled it again later. z here is nothing like x.
So half the covariates are drawn from distributions like x and half from z
Exactly. Where 0 is the true factor I guess?
Although x+z has a lower "MAE" and "MSE" than x+x, so I'm not sure I really get the mechanism by which this makes taking the average less effective.
In z you are more likely to draw a negative value right
Because it's mean not median centered
Correct
More likely to draw a negative but because the standard deviation of e**x is so large, the normalization crams more of the density of the distribution to be closer to 0 (with then a corresponding fatter tail to the right of the distribution than there is in a normal distribution).
Ugh still not clear to me it's off center when you do a few draws from z and a few from x and average across them
What do you mean? If you add a centered distribution and an off-center one, the sum of those distributions will still be off center (although more centered than just the off-center distribution alone). Example:
Yeah but what's the mean of x+z/2
Or is it the median that matters?
Yeah lol I'm getting confused haha. If it's the median that matters, then it makes sense why the transformations decrease performance. I don't quite get why the median would matter though.
Here are the deciles of each distribution:
Notes: doubtful we will be able to develop this theoretically. So, we will try to provide just a little justification within the simulation section.
What we have on curse of dimensionality in the sim section is fine.
However, we should aim to add a sentence on why transforming the variables makes averaging perform poorly.
@marionoro suggested:
We would expect PCR to outperform taking the average of all covariate measurements under this multiple-scale measurement framework as these transformations would warp the measurement error distributions around the true covariate in a way that a simple average would not be able to capture (whereas PCA would be better able to deal with this issue)
@nicomarto will give us his thoughts on this
I did some diagnostics but the results were frustrating:https://github.com/ijyliu/ECMA-31330-Project/blob/f8e33fb0a608738c710cf73b9c4ff50337bb08ae/Source/Simulations/exponential_experiments.ipynb
ok so what I suspect is it has to do something with this: since PCA tries to max the variance of the linear combinations of X, then clearly things changes when we take exponetial. If all is linear, smaple average is basically the linear combination that maximises the variance when all lineas are scaling the same. But when we take exponentials, exp(x+e) and exp(x+u) have different difference than x+e and x+u, so the "amount" of variance that we explain with those variables is different. Under PCA and sample average, x+u and x+e have basically the same weight in PCA and sample av when maximasing the variance, but the weights for exp(x+u) and exp(x+e) are way different
Average is equal weight for each measurement. PCA is not and would be more able to adjust weights to capture more variance. It's still tricky though because everything is rescaled so I guess idk why the weights need to be not equal
Yeah, the thing is that since all our missmeasured var are of the form x+e, so basically the best linear combination is very close to the average (in fact with p goes to infinity the average converge to the observation). Then the difference with PCA and av when we are no using exp(x+e) comes from the variance of the estimator, since I believe PCA would slightly change the weights to favor does where e is close to zero since they are closer to the true observation.
This does not happen when we take exp(x+e), since now clearly there are substantial differences between the optimal linear combination and the average
btw waht paul wrote sounds amazing
hmm i guess pca does always have lower sd in the table. but it seems like not by a lot. especially relative to the bias
I suspected that when doing the theory thing, and the simulation seems to support it. Intuitively, what PCA should be making is trying to keep all the noise behind. Then, the estimation of the true variable throuhg PCA should be more efficient that when doing sample average, and so the PCA "transfers" less variance to the estimated coefficientes than the sample average
We would expect PCR to outperform taking the average of all covariate measurements under this multiple-scale measurement framework as these transformations would warp the measurement error distributions around the true covariate in a way that a simple average would not be able to capture (whereas PCA would be better able to deal with this issue)
alright. so idk if we are better off putting this in the paper or modifying it or leaving things as is
what does @marionoro say
Here's a modification to consider
We would expect PCR to outperform taking the average of all covariate measurements under this multiple-scale measurement framework as these transformations would warp the measurement error distributions around the true covariate in a way that a simple average would not be able to capture (whereas PCA would be capable of optimally adjusting relevant weightings)
I don't really understand the discussion so I defer to y'all haha
It sounds nice to mee @ijyliu
We would expect PCR to outperform taking the average of all covariate measurements under this multiple-scale measurement framework as these transformations would warp some of the measurement error distributions around the true covariate in a way that a simple average would not be able to capture (whereas PCA would be capable of optimally adjusting relevant weightings)
warp some of the distributions
wait I got lost lol
Wait so are we done here lol?
idk lol
We would expect PCR to outperform taking the average of all covariate measurements under this multiple-scale measurement framework as these transformations would warp some of the measurement error distributions around the true covariate in a way that a simple average would not be able to capture (whereas PCA would be capable of optimally adjusting relevant weightings)
include this?
That sounds good to me!
same !
ok here's the full paragraph
if good go ahead and close
Looks good to me.
this is mentioned in the intro but i don't think it was discussed