ijyliu / ECMA-31330-Project

Econometrics and Machine Learning Group Project
2 stars 1 forks source link

Missing Variables in PCA #9

Closed ijyliu closed 3 years ago

ijyliu commented 3 years ago

Just a heads up, but we are going to have a real missing data problem with these panel datasets because you can't do PCA with NaN values.

http://pbil.univ-lyon1.fr/members/dray/files/articles/dray2015a.pdf

We can try to impute as much as we can to save observations.

ijyliu commented 3 years ago

I've done some interpolation, which helps a little. (I only want to do internal interpolation/between known observations, because other kinds are sketchier...)

Another thing we can do: figure out, mathematically, the maximum rectangular panel we can build (ie in terms of area). We could also kind of get at this using a graph.

image

Basically, sort this picture so that the longest black bars are on the left. This will create a curve/triangle. Then, try to draw the rectangle that maximizes the area within this triangle.

ijyliu commented 3 years ago

If we were to set missing values at the mean, could that prevent effects in the regression

ijyliu commented 3 years ago

Grouped lasso?

ijyliu commented 3 years ago

This is also a problem for the measurement error idea- I have to throw out a ton of countries because they are missing even one indicator!