rmcelreath / rethinking

Statistical Rethinking course and book package
2.1k stars 596 forks source link

Panda_nuts Nut-Cracking Frequency? #434

Closed emstruong closed 1 month ago

emstruong commented 1 month ago

Hello,

I was going through Chapter 16 and I noticed that the Panda_nuts data seemed a bit odd in that one of the chimpanzee's seems to have a very large number of measurements relative to other chimpanzees.

I didn't seem to find a reference to this in the book or online, so I was wondering if there may have been some kind of data-entry error? Well, regardless of whether it is an error, I thought it was worth remarking on.

library(rethinking)
#> Loading required package: cmdstanr
#> This is cmdstanr version 0.6.1
#> - CmdStanR documentation and vignettes: mc-stan.org/cmdstanr
#> - CmdStan path: /home/XXX/.cmdstan/cmdstan-2.33.1
#> - CmdStan version: 2.33.1
#> Loading required package: posterior
#> This is posterior version 1.5.0
#> 
#> Attaching package: 'posterior'
#> The following objects are masked from 'package:stats':
#> 
#>     mad, sd, var
#> The following objects are masked from 'package:base':
#> 
#>     %in%, match
#> Loading required package: parallel
#> rethinking (Version 2.41)
#> 
#> Attaching package: 'rethinking'
#> The following object is masked from 'package:stats':
#> 
#>     rstudent
data("Panda_nuts")
hist(Panda_nuts$chimpanzee, breaks = unique(Panda_nuts$chimpanzee), right = FALSE)

Created on 2024-05-20 with reprex v2.0.2

rmcelreath commented 1 month ago

Yes, some individuals were observed much more than others. There's a practice problem at end of the chapter that focuses on this I think.

emstruong commented 1 month ago

Yes, some individuals were observed much more than others. There's a practice problem at end of the chapter that focuses on this I think.

Is the question in 2E? Afaict, the Qs seem to be focused on the more traditional aspects of the model, rather than the number of measurements per chimpanzee per-se...

Well anyways, my concern was that it's not immediately clear to me whether you could use the same model for the data if the number of measurements per chimpanzee is skewed like this, so I just wanted to put this out there. I'm concerned that the validity/meaning of the parameters/model can change with the # of measurements.

rmcelreath commented 1 month ago

Third edition, problem 16H2.

emstruong commented 1 month ago

Third edition, problem 16H2.

Well, that's interesting--I'm curious then, am I just over-thinking things regarding considering the number of measurements? :sweat_smile: What's your take?

rmcelreath commented 1 month ago

Whether imbalance matters or not depends upon the causal structure and the target of inference. So yeah, it can matter and is worth investigating in most cases. But often it will just be a matter of precision, not of bias.

emstruong commented 1 month ago

Whether imbalance matters or not depends upon the causal structure and the target of inference. So yeah, it can matter and is worth investigating in most cases. But often it will just be a matter of precision, not of bias.

It depending on the causal structure and target of inference makes sense to me.