stan-dev / projpred

Projection predictive variable selection
https://mc-stan.org/projpred/
Other
109 stars 25 forks source link

Enhancement for `penalty` #176

Open fweber144 opened 3 years ago

fweber144 commented 3 years ago

This is an enhancement request, not a real issue. And it's not very urgent, I would say.

In my opinion, it's kind of hard for the user to specify argument penalty of varsel() and cv_varsel(), for two reasons:

  1. penalty expects a vector of indices (for the coefficients, not the terms) and it's kind of difficult to know the "table" to which the indices refer (with "table", I mean the full vector of coefficient indices together with the mapping from indices to coefficients).
  2. Categorical predictors with more than 2 categories have more than coefficient, so it gets even harder to know the "table" to which the indices refer.

So my suggestion would be to:

  1. Add a convenience function giving the possible coefficient names to the user. I'm not 100% sure, but I think they are the column names of the "model.matrix" from these lines: https://github.com/stan-dev/projpred/blob/a6c32fe04abb9b570ad7addcc9967267796b8023/R/search.R#L148-L151 Btw, that new function could also offer an optional argument for returning the possible terms (not coefficient names). For that, I think it could rely on https://github.com/stan-dev/projpred/blob/9b906cf0871d40919446ad6e92f442d77f939b3b/R/project.R#L142-L148
  2. Either stop here or go ahead and let penalty also accept the coefficient names, not only the coefficient indices.
AlejandroCatalina commented 3 years ago

This definitely makes sense and would be great to have. To be honest there has not been a real demand for this feature and I guess users of this package rather use the default options, so I agree on your assessment that this is not too urgent, but it shouldn’t also be complicated, so I’ll see if I get some time to do it!

Thanks!

From: Frank Weber @.> Date: Friday, 16. July 2021 at 10.54 To: stan-dev/projpred @.> Cc: Subscribed @.***> Subject: [stan-dev/projpred] Enhancement for penalty (#176)

This is an enhancement request, not a real issue. And it's not very urgent, I would say.

In my opinion, it's kind of hard for the user to specify argument penalty of varsel() and cv_varsel(), for two reasons:

  1. penalty expects a vector of indices (for the coefficients, not the terms) and it's kind of difficult to know the "table" to which the indices refer (with "table", I mean the full vector of coefficient indices together with the mapping from indices to coefficients).
  2. Categorical predictors with more than 2 categories have more than coefficient, so it gets even harder to know the "table" to which the indices refer.

So my suggestion would be to:

  1. Add a convenience function giving the possible coefficient names to the user. I'm not 100% sure, but I think they are the column names of the "model.matrix" from these lines: https://github.com/stan-dev/projpred/blob/a6c32fe04abb9b570ad7addcc9967267796b8023/R/search.R#L148-L151 Btw, that new function could also offer an optional argument for returning the possible terms (not coefficient names). For that, I think it could rely on https://github.com/stan-dev/projpred/blob/9b906cf0871d40919446ad6e92f442d77f939b3b/R/project.R#L142-L148
  2. Either stop here or go ahead and let penalty also accept the coefficient names, not only the coefficient indices.

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHubhttps://github.com/stan-dev/projpred/issues/176, or unsubscribehttps://github.com/notifications/unsubscribe-auth/ABZ5FH6OFQBMTAU4SCQS6JTTX7QVBANCNFSM5APB64WA.

fweber144 commented 3 years ago

Probably related to #66.