stan-dev / loo

loo R package for approximate leave-one-out cross-validation (LOO-CV) and Pareto smoothed importance sampling (PSIS)
https://mc-stan.org/loo
Other
150 stars 34 forks source link

problems using loo in clogit models #44

Open bgoodri opened 7 years ago

bgoodri commented 7 years ago

In a rstanarm branch, I introduced a stan_clogit function which is similar to the clogit function in the survival package but is actually described in more detail by Stata ( http://www.stata.com/manuals13/rclogit.pdf ). So, you will be able to do

post <- stan_clogit(case ~ spontaneous + induced, strata = stratum,
                            data = infert[order(infert$stratum, !infert$case),], QR = TRUE)

The problem is that some groups have the same values on the dummy variables spontaneous and induced, such as

> infert[infert$stratum == 3,]
    education age parity induced case spontaneous stratum pooled.stratum
3      0-5yrs  39      6       2    1           0       3              4
86     0-5yrs  39      6       2    0           0       3              4
168    0-5yrs  39      6       2    0           0       3              4

Thus, no matter what are the posterior realizations of the coefficients on spontaneous and induced, the likelihood for group 3 is the same.

I am pretty sure the correct concept for a clogit model is to imagine leaving out one group rather than one observation, but when I call loo::loo.matrix, the third column of the input is a constant, which causes the Pareto k estimate to be infinite. Since the third group could be omitted and only change the log-likelihood by a constant, this seems to be unreasonable.

The question becomes, what function should be catching this? We could have stan_clogit drop groups that have only one unique row in the design matri{x,ces}. We could have loo.stanreg omit such groups. Or loo::loo could check which Pareto k estimates are infinite and change them to some number when the log likelihoods are finite but constant. Thoughts @avehtari ?

avehtari commented 7 years ago

I have been waiting this to happen.

loo should check (almost) constant columns, skip Pareto fit, and return equal weights.

jgabry commented 7 years ago

Isn't it possible to get almost constant columns because of problems with model/sampling? In that case how would loo know if it was ok to ignore (e.g. clogit case) or if it's an actual problem?

On Wed, Mar 29, 2017 at 3:05 PM Aki Vehtari notifications@github.com wrote:

I have been waiting this to happen.

loo should check (almost) constant columns, skip Pareto fit, and return equal weights.

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/stan-dev/loo/issues/44#issuecomment-290193206, or mute the thread https://github.com/notifications/unsubscribe-auth/AHb4Q6yom_e_0NZkN1iFaiIHMW1G6g9iks5rqqtygaJpZM4MtXmA .

bgoodri commented 7 years ago

I think if the chains never move, then the n_eff will be low and the user should not be calling loo() in the first place.

On Wed, Mar 29, 2017 at 3:50 PM, Jonah Gabry notifications@github.com wrote:

Isn't it possible to get almost constant columns because of problems with model/sampling? In that case how would loo know if it was ok to ignore (e.g. clogit case) or if it's an actual problem?

On Wed, Mar 29, 2017 at 3:05 PM Aki Vehtari notifications@github.com wrote:

I have been waiting this to happen.

loo should check (almost) constant columns, skip Pareto fit, and return equal weights.

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/stan-dev/loo/issues/44#issuecomment-290193206, or mute the thread https://github.com/notifications/unsubscribe-auth/AHb4Q6yom_e_ 0NZkN1iFaiIHMW1G6g9iks5rqqtygaJpZM4MtXmA

.

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/stan-dev/loo/issues/44#issuecomment-290204939, or mute the thread https://github.com/notifications/unsubscribe-auth/ADOrqnyzHr975_DCJ-Vl9NoQvcQn70Saks5rqrX3gaJpZM4MtXmA .

jgabry commented 7 years ago

True. And that would trigger warnings from rstanarm.

On Wed, Mar 29, 2017 at 4:13 PM bgoodri notifications@github.com wrote:

I think if the chains never move, then the n_eff will be low and the user should not be calling loo() in the first place.

On Wed, Mar 29, 2017 at 3:50 PM, Jonah Gabry notifications@github.com wrote:

Isn't it possible to get almost constant columns because of problems with model/sampling? In that case how would loo know if it was ok to ignore (e.g. clogit case) or if it's an actual problem?

On Wed, Mar 29, 2017 at 3:05 PM Aki Vehtari notifications@github.com wrote:

I have been waiting this to happen.

loo should check (almost) constant columns, skip Pareto fit, and return equal weights.

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/stan-dev/loo/issues/44#issuecomment-290193206, or mute the thread https://github.com/notifications/unsubscribe-auth/AHb4Q6yom_e_ 0NZkN1iFaiIHMW1G6g9iks5rqqtygaJpZM4MtXmA

.

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/stan-dev/loo/issues/44#issuecomment-290204939, or mute the thread < https://github.com/notifications/unsubscribe-auth/ADOrqnyzHr975_DCJ-Vl9NoQvcQn70Saks5rqrX3gaJpZM4MtXmA

.

— You are receiving this because you commented.

Reply to this email directly, view it on GitHub https://github.com/stan-dev/loo/issues/44#issuecomment-290211442, or mute the thread https://github.com/notifications/unsubscribe-auth/AHb4Q6AVyc1jk1WQjnsFpG0MFwu_9K1uks5rqrtlgaJpZM4MtXmA .

avehtari commented 7 years ago

Another thing is that in clogit and some other models it would be better to leave one group out, which requires a bit of additional thinking how to implement that.

bgoodri commented 7 years ago

We have a workaround for this in the clogit case in a branch of rstanarm but it would be better to fix this for the next loo release.

jgabry commented 7 years ago

Is the only thing that needs to be fixed the false positive Pareto k warning?

jgabry commented 7 years ago

That is, is the only change to skip constant columns when doing the Pareto smoothing?

bgoodri commented 7 years ago

Yeah, what Aki said.

jgabry commented 7 years ago

Ok that should be pretty straightforward to take care of

On Fri, Aug 4, 2017 at 12:30 AM bgoodri notifications@github.com wrote:

Yeah, what Aki said.

— You are receiving this because you commented.

Reply to this email directly, view it on GitHub https://github.com/stan-dev/loo/issues/44#issuecomment-320153619, or mute the thread https://github.com/notifications/unsubscribe-auth/AHb4QxoAqmk4Rm7XychHBoct6By7Qtryks5sUp5mgaJpZM4MtXmA .