wengngai / NSSF_IBMs

Spatially explicit individual based models of tree species in the Nee Soon Freshwater Swamp Forest (NSSF) catchment in Singapore
1 stars 0 forks source link

Recruit dispersal and how (not) to kill them #13

Open hrlai opened 3 years ago

hrlai commented 3 years ago

Currently for the dispersal of recruits, we do:

  1. Calculate number of recruits
  2. Disperse them using the 2DT kernel
  3. Kill recruits that are too close to one another

But because step 3 happens after step 1, in that instance we actually just made the final number of recruits < than what the observational model predicts.

In other words, when we collect data in the field to parameterise the recruitment model, the observed recruits already went through step 3 (?) So do we still need to do step 3?

If we still want to ensure no recruits are too close to one another, we should do that during step 2, but I haven't thought of a elegant solution.

An alternative is not to kill the recruits, but rather let competition eventually thin themselves due to mortality.

wengngai commented 3 years ago

Good point. The problem is that seedling survival function looks at combined BA of all conspecifics within a 20m2 neighbourhood, with no consideration of the proximity of any one neighbour. So the amount of intraspecific pressure exerted by a neighbour 2cm away is the same as that by a neighbour 2m away. But that said, actually I did do a bit of playing around and it seems that the number of seedling neighbours which recruit within 10cm of each other is quite low in most cases, so I think we might get away without killing them, even if they are very close. If not, what I think we could do is to populate the landscape with new recruits one by one over loop. then if a seedling exists in a location that a new recruit is meant to occupy, the random location is generated again. But extremely time consuming...

hrlai commented 3 years ago

we might get away without killing them, even if they are very close

yes, this is my thought too, maybe we could just skip that step, which is also one of the slowest step during the simulation.

populate the landscape with new recruits one by one over loop. then if a seedling exists in a location that a new recruit is meant to occupy, the random location is generated again.

Glad we're thinking the same thing! But I have a feeling that there are pitfalls to this method (say, the resultant distribution may not be the same as the 2DT kernel), but still couldn't point my finger to it... It is like, in every step, you're truncating the 2DT kernel (from the inside), taking a 10-cm-wide chunk out, and we have to somehow make sure that the new 2DT kernel sums to the right probability area before sampling the next... I am more worried about this than the time, because currently we're already looping over recruits to compute inter-recruit distance.

hrlai commented 3 years ago

On a related note, did you notice that sometimes the fruiting volume is really high for some individuals? For example, at time step 1 a Prunus tree can produce up to 5099081 fruits!!! Can this inflate the number of recruit too high? It can be the simulation very slow and unrealistic...

If you're also seeing this happening on your side, I think the culprit is the exponential transformation of the log-Gaussian model. We may really need to model fruiting volume in a GLM, at least with Poisson. Speaking of which, fruiting volume is also now a deterministic process in the IBM, without any variance (not even the log-Gaussian model's residual variance).

wengngai commented 3 years ago

On a related note, did you notice that sometimes the fruiting volume is really high for some individuals? For example, at time step 1 a Prunus tree can produce up to 5099081 fruits!!! Can this inflate the number of recruit too high? It can be the simulation very slow and unrealistic...

If you're also seeing this happening on your side, I think the culprit is the exponential transformation of the log-Gaussian model. We may really need to model fruiting volume in a GLM, at least with Poisson. Speaking of which, fruiting volume is also now a deterministic process in the IBM, without any variance (not even the log-Gaussian model's residual variance).

I've now remodelled fruiting volume with a negative binomial model instead (poisson was overdispersed). updated parameters are in "fruiting parameters Jul21.csv". I will make changes to the code to use these parameters instead of the linear model if you think this is better. There's also the dispersal parameter which would contribute to generating some variance. I hope this will help?

hrlai commented 3 years ago

Nice. Yeah it should help to generate more reasonable integers. Also cool is that it now becomes stochastic. :)

wengngai commented 3 years ago

Sorry I need a bit of help in this. I'm a bit unsure of how to translate the parameter estimates into predictions now that we're using negative binomial. I have 2 questions:

  1. Do the equations in our R markdown file still apply? I'm thinking they should, because NB uses log link function, except for an additional dispersal parameter the intercept and slope should by almost identical? but if you compare the values of fruited.int and fruited.z in "fruiting parameters Apr21.csv" (logged linear model) against that in "fruiting parameters Jul21.csv" (NB model), the values are really different, which is what is making me doubt myself
  2. I'm planning to use MASS::rnegbin to generate the random recruitment numbers, and wanted to check if i'm doing things correctly. mu = \rho{0,j} + \phi{D,j} D_{ij} and theta = theta should give us what we want right? note that this is not the exact same set of params estimated by the NB model, since we change the intercept to account for fruits which didn't make it to recruits.
hrlai commented 3 years ago

Do the equations in our R markdown file still apply?

Yes, the linear predictor equation should still be identical. I am not sure if the intercepts and slopes should still be identical though, mainly because of the Jansen's inequality (i.e., log of means is not mean of logs).

Previously the log-linear Gaussian model is:

log Y ~ Normal(mu, sigma)
mu = f(X)

But the negative binomial model is:

Y ~ NB(mu, phi)
log mu = f(X)

Notice that previously we estimate the mean of log Y, but now we estimate the log of mean (mu). As long as the fitted values are similar to the observed values, I'd trust the new negative binomial GLM :) Another quick-and-dirty way to check is to simulate a lot of random values using rnbinom with the estimated parameter and see if the histogram are well within reasonable range.

planning to use MASS::rnegbin to generate the random recruitment numbers, and wanted to check if i'm doing things correctly

Which R package did you use to fit the negative binomial GLM? They differ a bit in how they specific the mean and dispersion of NB, which is always a headache. For instance, the built in rnbinom seems different from MASS::rnegbin.

wengngai commented 3 years ago

Thanks for the explanation. Makes sense now. I used glmer.nb() to fit the models. Let me try to make some simulations to compare the log-linear versus NB parameters tomorrow...