Document weights - Githubissues

ralikwen commented 2 years ago

Please add the possibility of using weighted cases.

sometimes you have weighted data and you just can't help it
sometimes you have a large dataset and it is easier to work with it using weights

... segmented does take weights

lindeloev commented 2 years ago

This was added in v0.3.0, but it's not documented yet. Just use y | weights(weight_col) ~ ... on the right-hand side. E.g.:

model = list(
  y | weights(weight_col) ~ 1 + x,
  ~ 0 + x
)

It is visualized as dot size in plot() but otherwise only exerts its effects during sampling. Does this solve your problem?

In any case, I'll keep this issue open as a reminder that weights should be documented better. So if anything raised doubt or seems non-intuitive to you, I'd be grateful for your feedback so I can write it up :-)

ralikwen commented 2 years ago

This does solve my problem. Great package, thanks a lot.

ralikwen commented 2 years ago

I had a related issue posted on stackoverflow. With your help I could create something that is close to an answer. https://stackoverflow.com/questions/70056988/comparing-segmented-models-in-r/70063671#70063671 It is still not clear to me how to interpret the result of the model comparison though - what is a large difference, what is a significant difference? I would be grateful for your insights. Many thanks.

On Mon, Nov 22, 2021 at 9:36 AM Jonas Kristoffer Lindeløv < @.***> wrote:

This was added in v0.3.0, but it's not documented yet. Just use y | weights(weight_col) ~ ... on the right-hand side. E.g.:

model = list( y | weights(weight_col) ~ 1 + x, ~ 0 + x )

It is visualized as dot size in plot() but otherwise only exerts its effects during sampling. Does this solve your problem?

In any case, I'll keep this issue open as a reminder that weights should be documented better. So if anything raised doubt or seems non-intuitive to you, I'd be grateful for your feedback so I can write it up :-)

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/lindeloev/mcp/issues/132#issuecomment-975271441, or unsubscribe https://github.com/notifications/unsubscribe-auth/AB27ZEDIA3VC6KPGGETYNYLUNH6JBANCNFSM5IQKDTGQ . Triage notifications on the go with GitHub Mobile for iOS https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675 or Android https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub.

lindeloev commented 2 years ago

I've written a bit about interpreting ELPD differences here: https://lindeloev.github.io/mcp/articles/comparison.html#what-is-loo-cv. See also this thread by LOO champion Aki Vehtari: https://discourse.mc-stan.org/t/interpreting-elpd-diff-loo-package/1628.

It's a bit involved, but let me know if anything is unclear. And it would be great if you could update your StackOverflow reply with anything you learn - or perhaps just this link. I'm sure many future users would appreciate that.

ralikwen commented 2 years ago

hi,

I have updated my StackOverflow reply. As I see the bottomline is quite straightforward:

less than 2 - no difference
more than 5 - significant difference
else - we don't know

Would be great to find some citable resource to this effect. Thanks a lot. B.

On Mon, Nov 22, 2021 at 12:51 PM Jonas Kristoffer Lindeløv < @.***> wrote:

I've written a bit about interpreting ELPD differences here: https://lindeloev.github.io/mcp/articles/comparison.html#what-is-loo-cv. See also this thread by LOO champion Aki Vehtari: https://discourse.mc-stan.org/t/interpreting-elpd-diff-loo-package/1628.

It's a bit involved, but let me know if anything is unclear. And it would be great if you could update your StackOverflow reply with anything you learn - or perhaps just this link. I'm sure many future users would appreciate that.

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/lindeloev/mcp/issues/132#issuecomment-975441742, or unsubscribe https://github.com/notifications/unsubscribe-auth/AB27ZEBX34OQ6B6WGFEHJK3UNIVC3ANCNFSM5IQKDTGQ . Triage notifications on the go with GitHub Mobile for iOS https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675 or Android https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub.

lindeloev commented 2 years ago

I guess cross-validation (and Bayesian inference in general) is more about quantifying evidence than threshold-like decisions (significant/non-significant).

There's a deeper exploration of the difficulty of estimating the elpd-uncertainty here: https://arxiv.org/abs/2008.10296. My takeaway: elpd-diff can be interpreted as a z-score if the models

Have not-too-similar predictions. One often compares very similar models, so this is a frequent limitation, I think.
Are not misspecified.
Are fitted to non-small data.

lindeloev / mcp

Document weights #132