ldpape / iOLS_delta

The repository displays all programs to estimate iOLS, i2SLS, as well as their associated tests.
3 stars 2 forks source link

Automatically search for the best hyper-parameter (delta)? #3

Open fabrizioleone opened 1 year ago

fabrizioleone commented 1 year ago

Hi,

Thanks for your work and for sharing this great package.

I would like to use the iOLS_delta_HDFE function in a setting in which the "log of zero" problem arises. I understand I need to specify a value for the hyper-parameter $\delta$ and that the function iOLS_delta_HDFE_test provides a way to select the best $\delta$ given my data.

My understanding is that one should try different $\delta$ values and select the one that maximises the lambda statistics that iOLS_delta_HDFE_test computes (page 23 of your paper). Is this right?

I am currently doing the following. I specify a grid for $\delta$ and loop over each element searching for the value that maximises the lambda statistics. However, this "greedy search" is very expensive and the results obviously depend on the chosen grid. Is there a way to automatically select the best $\delta$?

I provide below a minimal example of my current approach with simulated data.


* Housekeeping
clear   all
eststo  clear
set     seed 123
set     maxvar  32767
set     emptycells drop

* Create (I x T) panel indicators
local   I = 50
local   T = 10
set     obs `I'
gen     i = _n
expand  `T'
bys     i: gen t = _n

* Simulate variables
gen     u = rnormal(0,1) 
gen     x = (runiform() > 0.5)
bys     i (t): gen a_i = rnormal(0,1)
bys     i (t): replace a_i = a_i[1]
gen     y = 1.0 + a_i + 0.5 * x + u
reghdfe y x, abs(i) 

* Introduce zeros
replace y = 0 if y < 0

* Cleaning
keep    i t y x 
order   i t y x 
gsort   i t

* Initialize locals and output
local  deltas 0.1 0.5 1.0 
local  delta_count : word count `deltas'
local  bsrep = 5    
mat    lambdas = J(`delta_count',2,.)

* Loop over values of lambda
local j = 1
foreach d of local deltas {

  * Compute hyper-parameter delta
  bootstrap lambda_stat = e(lambda), reps(`bsrep'): iOLS_delta_HDFE_test y x, delta(`d') absorb(i)

  * Store matrix
  matrix lambda = e(b)
  scalar lambda = lambda[1,1]
  mat    lambdas[`j', 1] = `d'
  mat    lambdas[`j', 2] = lambda

  * Update counter
  local ++j

} 

* Show matrix of Lambdas
matrix colnames lambdas = delta lambda_stat
matrix list lambdas

/*
       delta  lambda_stat
r1           .1     1.0192147
r2           .5    1.0208472
r3            1     1.0209707

*/

In this simple example, I would select $\delta = 0.1$. However, I would like to search over a much finer and larger grid. Any help is much appreciated. Thank you.

ldpape commented 1 year ago

Hello,

Thank you very much for your questions.

To select $\delta$ , you can ideally select the one which provides the smallest t-score associated with testing for $\lambda = 1$. This is very expensive to do as you need to calculate the standard error associated with this test. A more economical approach is to simply take the $\delta$ which provides provide the $\lambda$ closest (in absolute values) to 1.

When we worked on the applications of the paper, we found a grid of deltas of the form {exp(-4), exp(-2), ... , exp(6)} was effective, but you may want to have finer increments. Also, you may gain speed when iterating for different $\delta$ by setting starting values to be close to your previous estimates.

Let me know if I can help in some other way.

Kind regards, Louis