harrysouthworth / gbm

Gradient boosted models
Other
106 stars 27 forks source link

Bug in CoxPH cpp code??? #22

Closed jeffwong closed 9 years ago

jeffwong commented 10 years ago

The CoxPH cpp code to calculate deviance does not agree with the equation listed in the vignette. The function

double CCoxPH::Deviance ( double adT, double adDelta, double adOffset, double adWeight, double *adF, unsigned long cLength, int cIdxOff ) { unsigned long i=0; double dL = 0.0; double dF = 0.0; double dW = 0.0; double dTotalAtRisk = 0.0;

dTotalAtRisk = 0.0;
for(i=cIdxOff; i<cLength+cIdxOff; i++)
{
    dF = adF[i] + ((adOffset==NULL) ? 0.0 : adOffset[i]);
    dTotalAtRisk += adWeight[i]*exp(dF);
    if(adDelta[i]==1.0)
    {
        dL += adWeight[i]*(dF - log(dTotalAtRisk));
        dW += adWeight[i];
    }
}

return -2*dL/dW;

}

has two key values, dL and dW. dL is calculating

\sum (w_i \delta_i (f(x_i) - \log R_i)

and dW is calculating

\sum w_i \delta_i

If we expand the formula for deviance in the vignette we would get

-2 \sum (w_i \delta_i [ f(x_i) - \log R_i + \log w_i ] )

the calculation for dL is valid but dW seems to be wrong. We would want dw to be calculating

\sum (w_i \log w_i)

and the final answer should combine the values dL and dW via

-2 * (dL + dW)

harrysouthworth commented 10 years ago

Thanks for this. Much appreciated. I don't know when I'll find time to dig into it, but thanks very much for reporting it.

Harry

On 15 June 2014 06:38, Jeffrey Wong notifications@github.com wrote:

The CoxPH cpp code to calculate deviance does not agree with the equation listed in the vignette. The function

double CCoxPH::Deviance ( double adT, double adDelta, double adOffset, double adWeight, double *adF, unsigned long cLength, int cIdxOff ) { unsigned long i=0; double dL = 0.0; double dF = 0.0; double dW = 0.0; double dTotalAtRisk = 0.0;

dTotalAtRisk = 0.0; for(i=cIdxOff; i<cLength+cIdxOff; i++) { dF = adF[i] + ((adOffset==NULL) ? 0.0 : adOffset[i]); dTotalAtRisk += adWeight[i]exp(dF); if(adDelta[i]==1.0) { dL += adWeight[i](dF - log(dTotalAtRisk)); dW += adWeight[i]; } }

return -2*dL/dW;

}

has two key values, dL and dW. dL is calculating

\sum (w_i \delta_i (f(x_i) - \log R_i)

and dW is calculating

\sum w_i \delta_i

If we expand the formula for deviance in the vignette we would get

-2 \sum (w_i \delta_i [ f(x_i) - \log R_i + \log w_i ] )

the calculation for dL is valid but dW seems to be wrong. We would want dw to be calculating

\sum (w_i \log w_i)

and the final answer should combine the values dL and dW via

-2 * (dL + dW)

— Reply to this email directly or view it on GitHub https://github.com/harrysouthworth/gbm/issues/22.

harrysouthworth commented 9 years ago

This issue was moved to gbm-developers/gbm#10