kenneth-rios / mixed-panel-logit-default-risk

Forecasting Sovereign External Debt Default via Mixed Panel Logit Simulation
0 stars 0 forks source link

Mixed panel logit theory #3

Open kenneth-rios opened 5 years ago

kenneth-rios commented 5 years ago

Mixed panel logit model assuming randomized coefficients across countries and year-fixed effects.

A mixed model is necessary to estimate unconditional probabilities (no such luck with conditional logit). Using a panel estimator is necessary to sum over time and find unconditional probabilities.

See PDF on how to go from unconditional probabilities to log-likelihood for MLE estimation.

kenneth-rios commented 5 years ago

Alternatives: $j = 1$ (default) or $j = 0$ (no default)!

For each country $n$ and year $t$, sum over all such $j$'s. In our data, we model $j = 1/0$ using the default variable. Denominator is the sum of the exponentials of the model with $j=0$ and $j=1$ (for each $nt$) and the numerator is the exponential of choice $i = 1/0$ at year $t$.

Then we can multiply logits across all time periods to estimate the probability of default conditional on $\beta$ since the logistically distributed errors are independent. We then integrate over $\beta$ to estimate the unconditional probability of default for each decision-maker (country) in each year assuming $\beta$ is drawn from a joint normal distribution.

kenneth-rios commented 5 years ago

$\beta_n$ means that the slopes on the observed data vary across the countries in the population according to some joint normal distribution $f(\beta)$, which requires estimation of the vector of means $\mu$ and the covariance matrix $\Sigma$. Thus each variable implies two parameters to be estimated. This becomes part of the theoretical log-likelihood. Eventually, we estimate the parameters using MLE on a bootstrapped log-likelihood function.

The actual bootstrapped log-likelihood function to be used for MLE characterizes the (1000) draws for the $k$ randomized coefficients as coming from an independent normal distribution with mean $\mu_i$ and standard deviation $\sigma_i$ for $i = 1, \dots, k$. The draws are themselves functions of the parameters $\mu_i$ and $\sigma_i$. So the simulated log-likelihood is a function of $\mu_i$, $\sigma_i$, and $\alpha_t$ for $t = 1, \dots, T$ ($\alpha$ are the fixed effects which require no distribution/simulation).

The simulated log-likelihood is augmented by an L2 regularization penalty term, which is a penalty on the sum of squares of the parameters (~or is it the sum of the squares of the coefficients, which are themselves functions of the parameters??~ see: https://stackoverflow.com/questions/46244095/conditional-logit-for-panel-data-in-python/48470963?noredirect=1#comment94066414_48470963). This defines our objective function. The objective function is then maximized over the parameters mentioned above. This maximization procedure is undertaken numerically, using the Nelder-Mead algorithm. We loop over all possible $\lambda$s using a grid search and then choose the model that returns the highest log-likelihood as our preferred model.

After the model parameter's are estimated, the predicted unconditional probabilities are estimated using the test data and the expression for the simulated unconditional probabilities. We also use 1000 draws for the predictions. Our draws this time are not functions of the previously unknown parameters, but coefficients drawn from estimated, independent normal distributions for each coefficient with mean $\hat{\mu_i}$ and standard deviation $\hat{\sigma_i}$.

kenneth-rios commented 5 years ago

We can look into 5/10-fold cross-validation to estimate the best $\lambda$ to use, interpreting the L2 penalization as assuming a Gaussian prior over the means, standard deviations, and non-random coefficients, thus implementing MAP estimation (Bayesian regression).

The optimal $\hat{\lambda}$ is thus determined by CV using the negative log-likelihood as the loss function. See PDF and discussion in https://stats.stackexchange.com/questions/176720/estimate-the-tuning-parameter-in-ridge-logistic-regression for an alternative statistic, the Akaike Information Criteria.

kenneth-rios commented 5 years ago

Theory Sections: