sergiocorreia / ppmlhdfe

Poisson pseudo-likelihood regression with multiple levels of fixed effects
http://scorreia.com/software/ppmlhdfe/
MIT License
64 stars 13 forks source link

Algorithm not working on example separation datasets #6

Open luispfonseca opened 4 years ago

luispfonseca commented 4 years ago

Hi to all,

I was implementing the separation algorithm myself and I was testing the example datasets. I just followed the example code in https://github.com/sergiocorreia/ppmlhdfe/blob/master/guides/separation_primer.md and found differences in the results. I checked the example datasets and there were differences between what the example datasets say is separated and the output of the algorithm. Please see these two examples (3 and 4):

import delimited https://raw.githubusercontent.com/sergiocorreia/ppmlhdfe/master/test/separation_datasets/03.csv, clear

* Run IR (iterative rectifier) algorithm
loc tol = 1e-5
gen u =  !y
su u, mean
loc K = ceil(r(sum) / `tol' ^ 2)
gen w = cond(y, `K', 1) 

while 1 {
    qui reghdfe u [fw=w], absorb(id1 id2 id3) resid(e)
    predict double xb, xbd
    qui replace xb = 0 if abs(xb) < `tol'

    * Stop once all predicted values become non-negative
    qui cou if xb < 0
    if !r(N) {
        continue, break
    }

    replace u = max(xb, 0)
    drop xb w
}

rename xb z
gen is_sep = z > 0
list
assert separated == is_sep

(1 contradictions)

import delimited https://raw.githubusercontent.com/sergiocorreia/ppmlhdfe/master/test/separation_datasets/04.csv, clear

* Run IR (iterative rectifier) algorithm
loc tol = 1e-5
gen u =  !y
su u, mean
loc K = ceil(r(sum) / `tol' ^ 2)
gen w = cond(y, `K', 1) 

while 1 {
    qui reghdfe u [fw=w], absorb(id1 id2) resid(e)
    predict double xb, xbd
    qui replace xb = 0 if abs(xb) < `tol'

    * Stop once all predicted values become non-negative
    qui cou if xb < 0
    if !r(N) {
        continue, break
    }

    replace u = max(xb, 0)
    drop xb w
}

rename xb z
gen is_sep = z > 0
list
assert separated == is_sep

(2 contradictions)

Can you please tell me if 1) there is something more to the algorithm not captured in the example code provided, and having that would flag those observations differently; 2) or whether there is something wrong in the example datasets; 3) or those observations are flagged differently by one of the other methods and if so, how to interpret that?

Thanks again for this package. It's great!

Luís

sergiocorreia commented 4 years ago

Hi Luis,

Thanks for the report. I just started an extended leave so will be out at least a week or so, with limited connectivity. Will look at it in more detail then, but if not feel free to send me an email (github doesn't have a snooze function, so sometimes things fall through thw cracks).

Cheers, Sergio