Open mikemeredith opened 7 years ago
@ito4303 Would you mind looking at this?
@mikemeredith Thanks for the carefully constructed issue. Is this a problem with the BUGS code in the original book or just an issue with the translation?
I have a complete replication of Dorazio and Royle's occupancy model here:
http://mc-stan.org/documentation/case-studies/dorazio-royle-occupancy.html
I hope that's a faithful translation of their cited paper, but if not, please let me know, as I understand that model better than the BPA models. In the Dorazio and Royle approach, the result gets bounded between observed and max (pseudo-sites) because you start with the occupied ones (they're observed) then add in expected occupance of other ones, which has to be between 0 and 1. So we don't need to declare constraints to enforce the constraint.
Exceeding the constraints is a symptom, not the problem; enforcing constraints isn't the solution but is a check. The problem is that, in the call to binomial_rng
, N
is too big and theta
is too small. That will often give a plausible result, but is still wrong.
In the BUGS code, they use a latent binary variable, z[i]
, for occupancy for each site, and then occ_fs = sum(z)
. They don't need to do the kind of posterior predictive thing. So it's an issue with the translation.
If I were really using Stan (instead of translating), I'd get the conditional (on the data) probability of occupancy for each site and sum those to get an expectation for occ_fs
.
I'll look at the Dorazio et al stuff later, but what you describe is I think the right way to do it and what's needed for the BPA Ch.13 examples and the closed-capture examples in Ch.6.
@mikemeredith I greatly thank you for pointing out the problems. As you pointed out, they are incorrect translations.
I will contribute corrected models as soon as possible. Or, perhaps you can create pull requests with the corrected code.
@ito4303 I'll do pull requests for the models I've looked at so far. Give me a day or 2.
Great! Thanks.
On Mar 17, 2017, at 10:50 PM, mikemeredith notifications@github.com wrote:
@ito4303 I'll do pull requests for the models I've looked at so far. Give me a day or 2.
— You are receiving this because you commented. Reply to this email directly, view it on GitHub, or mute the thread.
@bob-carpenter There are issues with the write-up of the Dorazio & Royle analysis, but I think the math is all ok. Where would be a good forum to discuss this?
An issue in this repo---the source is in example-models/knitr/dorazio-royle-occupancy/
@bob-carpenter I've done a pull request.
(I'm a Stan newbie, but I have played with occupancy models a lot, including the WinBUGS and JAGS code in BPA.)
These are occupancy studies, and
occ_fs
is the number of sites in the sample which are occupied. As such, it cannot be less than the number of sites observed to be occupied (occ_obs
) or greater than the total number of sites in the sample (R
). The example code allows these constraints to be violated; it's just (bad!) luck that this doesn't happen with the example data from BPA.A key idea is the conditional probability of occupancy,
psi_con
, conditional on the data. So if the species is not detected, we know the site is in the yellow area in the diagram below.Model
site_occ.stan
The simple model has one value for
psi
and one forp
, so all sites with no detections have the same value forpsi_con
, calledpsi_nd
in the code below:Model
site_occ_cov.stan
Here each site has a different
psi
andp
, and a good strategy is to calculatepsi_con
for each, which is in any case a result you may want to monitor.Model
bluebug.stan
The same strategy can be applied here, this time allowing for the variable number of visits to each site. So we replace
with
Other models
I haven't looked closely at the dynamic models, but it appears that the reported
z
andn_occ
values are also based on unconditionalpsi
values. Of more concern are the closed-capture models in Ch.6, where the analogous value, the population sizeN
, appears to have the same problem; this has a smaller numerical impact, as the denominator of the conditional inclusion is always large (ie, close to 1), but should be fixed.