Comments on Simulation Code

ijyliu / ECMA-31330-Project

Econometrics and Machine Learning Group Project

2 stars 1 forks source link

Comments on Simulation Code #79

Closed ijyliu closed 3 years ago

ijyliu commented 3 years ago

[x] according to #64 we should be taking the log of the true variable and rescale, then do the comparison. But at the moment we are taking e to the power of the mismeasured variable as out transform. I mean, I guess doing the exponential give us a change in the order of magnitude of items as we desire.
[x] Another question is whether to do the logging/transformation before or after adding in the error. I think in the current code it's after the error has been added, which seems to make sense; otherwise you would need to scale/fiddle with the error structure a lot. Though the comment in #64 said to transform the true variable.
[x] I don't see the rescaling (by -1 perhaps) mentioned in #64? Except for maybe in the IV formula? Not clear to me what that -1 is.

paul-opheim commented 3 years ago

1) i ended up taking the exponential instead of the log because, as you say, it gives us a change in the order of magnitude but it doesn't have the complications of taking the log (what do we do when the number is negative?).

2) I also took the transformation of the value with the error because, like you say, it makes it so that we don't have to mess with the errors a bunch.

3) I interpreted the rescaling as being "and/or" with the exponential, so taking the exponential seemed good enough to me.

ijyliu commented 3 years ago

Why is there a -1 in the IV formula?

I guess maybe you can do the absolute value and then the log, but I agree that seems weird.

paul-opheim commented 3 years ago

Where is there a -1?

ijyliu commented 3 years ago

second line

ijyliu commented 3 years ago

Also, potentially dumb question, but what are the ppts in the APEs part of the tables?

paul-opheim commented 3 years ago

Ah. The -1 is so that the regression does not include an intercept.

The ppts in the parentheses are the standard deviations of the absolute percentage error. The idea being that the main number is the mean and then the number in the parentheses is the standard deviation (for the coefficient and the APE). Perhaps I should say that somewhere?

ijyliu commented 3 years ago

Oh, I might need to fix some regressions then

I don't know that the standard deviation of the APE makes much sense conceptually? I mean, we have the standard deviation of the actual coefficient.

paul-opheim commented 3 years ago

Ah, that's a fair point. I'll just report the MAPEs then.

ijyliu commented 3 years ago

Ok, the new automated code just does the MAPEs

I need to go check those intercepts now

In simulations code:

[x] may want to remove intercept from IV first stage

In empirical code

[x] may want to remove intercept from all items except for IV, which already doesn't have one

paul-opheim commented 3 years ago

Did you remove the intercept from the first stage of the IV when you ran the n=3,000 simulations? It doesn't seem like it based on Run_Simulations.ipynb, but I also don't know if we should have an intercept in that regression?

ijyliu commented 3 years ago

No, I didn't remove it from the first stage in the simulations. I don't know if we should.

ijyliu commented 3 years ago

remaining points in issue obsoleted by #87 as we will be using statsmodels IV