stan-dev / rstanarm

rstanarm R package for Bayesian applied regression modeling
https://mc-stan.org/rstanarm
GNU General Public License v3.0
388 stars 133 forks source link

use COD to create sufficient statistic #555

Closed bgoodri closed 2 years ago

bgoodri commented 3 years ago

fixes #534

bgoodri commented 3 years ago

I can't bring myself to merge this but if @jgabry does, then I'll live with it.

jgabry commented 3 years ago

I can't bring myself to merge this but if @jgabry does, then I'll live with it.

Can you elaborate? I'm not sure what the pros and cons of this are, so not sure how to evaluate.

bgoodri commented 3 years ago

It changes the likelihood function when X is collinear (compared to glm, where there is no unique MLE) to produce an answer that conditions on one out of an infinite number of "sufficient" statistics (that are not actually sufficient), which is a valid posterior distribution but not one that corresponds to anyone's beliefs about the parameters. It also differs from the answer you would get with the same seed if you specify sparse = TRUE, even though the data are not actually sparse, so that you condition on the data rather than the sufficient statistic. Presumably, there is some way to construct X such that the columns are perfectly collinear but not numerically perfectly collinear enough to go through the code path that implements the new behavior and conversely some way to construct an X that is not perfectly collinear but is sufficiently numerically collinear to go through the branch that the code path that implements the new behavior. It could be seen as adding additional prior information that the user is not aware of and can't change, except to specify sparse = TRUE to avoid it.

On Tue, Nov 2, 2021 at 1:53 PM Jonah Gabry @.***> wrote:

I can't bring myself to merge this but if @jgabry https://github.com/jgabry does, then I'll live with it.

Can you elaborate? I'm not sure what the pros and cons of this are, so not sure how to evaluate.

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/stan-dev/rstanarm/pull/555#issuecomment-957990379, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAZ2XKUUEGPM46ZDCGXCNY3UKAXUHANCNFSM5HGYQ7CQ . Triage notifications on the go with GitHub Mobile for iOS https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675 or Android https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub.

jgabry commented 3 years ago

Ok I think I follow most of that. If I understand you correctly, this part

Presumably, there is some way to construct X such that the columns are perfectly collinear but not numerically perfectly collinear enough to go through the code path that implements the new behavior and conversely some way to construct an X that is not perfectly collinear but is sufficiently numerically collinear to go through the branch that the code path that implements the new behavior.

means that this doesn't actually guarantee that Andrew gets the desired behavior when there's perfect collinearity and it also doesn't guarantee that when there's no collinearity we get exactly the same result as before this PR. Is that right? (If that's the case then I think we should probably discuss this more with Andrew because I don't think he's necessarily aware of these subtleties. I definitely wasn't aware until now.)

jgabry commented 3 years ago

I'm gonna send an email to you and Andrew about this. I'm not entirely comfortable merging this yet (for the reasons you described). That said, I did just test it and it does work well (or at least it runs well) on the test case you included. So this seems good to go if we decide to use it.

hsbadr commented 2 years ago

The problem with this kind of change is that it needs to be maintained separately. For example, changes in Stan/Math or RcppEigen may affect the templates for CODOLS, such as the broken csr_matrix_times_vector2 with Stan ~v2.28. While this might be an easy task, I'd rather prefer to completely rely on Stan/Math here (and, BTW, remove csr_matrix_times_vector2 too by replacing it with csr_matrix_times_vector in the Stan code). Any required changes for performance would be better included in Math, probably with a simple interface to disable time-consumin checks, for example.

hsbadr commented 2 years ago

@bgoodri FYI. With the development version of RStan + master branch of RStanARM:

In file included from stan_files/continuous.cc(3):
stan_files/continuous.hpp(6025): error: no instance of overloaded function "model_continuous_namespace::CODOLS" matches the argument list
            argument types are: (Eigen::Matrix<double, -1, -1, 0, -1, -1>, Eigen::Map<Eigen::Matrix<double, -1, 1, 0, -1, 1>, 0, Eigen::Stride<0, 0>>, std::ostream *)
          stan::model::assign(OLS, CODOLS(X_, y, pstream__),
                                   ^
../inst/include/CODOLS.hpp(14): note: this candidate was rejected because at least one template argument could not be deduced
  CODOLS(const Eigen::Matrix<T2__, Eigen::Dynamic, Eigen::Dynamic>& X,
  ^
stan_files/continuous.hpp(3665): note: this candidate was rejected because at least one template argument could not be deduced
    CODOLS(const T0__& X_arg__, const T1__& y_arg__, std::ostream* pstream__) ;
    ^

In file included from stan_files/continuous.cc(3):
stan_files/continuous.hpp(6025): error: no instance of overloaded function "stan::model::assign" matches the argument list
            argument types are: (Eigen::Map<Eigen::Matrix<double, -1, 1, 0, -1, 1>, 0, Eigen::Stride<0, 0>>, <error-type>, const char [23])
          stan::model::assign(OLS, CODOLS(X_, y, pstream__),
          ^

558 compiles successfuly.

andrjohns commented 2 years ago

This is a pretty easy fix - the templates just need to be updated for Eigen expressions (and calling to_ref on the inputs and possibly returning via make_holder). Should I make the change or did you want to handle this Ben?

bgoodri commented 2 years ago

I think I got it in my last few commits.

On Tue, Mar 15, 2022 at 11:18 PM Andrew Johnson @.***> wrote:

This is a pretty easy fix - the templates just need to be updated for Eigen expressions (and calling to_ref on the inputs and possibly returning via make_holder). Should I make the change or did you want to handle this Ben?

— Reply to this email directly, view it on GitHub https://github.com/stan-dev/rstanarm/pull/555#issuecomment-1068688844, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAZ2XKVK3Q7MQ2FNZHKRVV3VAFHHRANCNFSM5HGYQ7CQ . Triage notifications on the go with GitHub Mobile for iOS https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675 or Android https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub.

You are receiving this because you were mentioned.Message ID: @.***>

hsbadr commented 2 years ago

Yes, fixed by b7b25798b76e98b7579205d8e78f0387d45911fc.