Closed jscamac closed 9 years ago
Argh! Forgot to constrain the sigma's in the measurement error part of the model. This change seems to have improved speed. Rerunning tests to confirm.
But will still write alternative as outlined above as there might still be some efficiency gain by only estimating true growth rate once.
Measurement error was removed from the mortality model. I dropped the mixture model as it was too slow and sampled some observations very poorly (some observations has two solutions/means under the mixture model). Measurement error in dbh is now calculated in a separate model and the posterior means are then used in the mortality analysis.
Preliminary runs suggest this is much faster and sampling is more efficient. To such an extent I think we can half the number of iterations we run from 2000 to 1000.
Preliminary run beyond 10 iterations has revealed that the sampling is extremely slow (<100 iterations after 19 hours).
This is probably due to the model trying to do too much. Specifically, the model consists of 5 submodels:
Two models to estimate true dbh for both measurements required to calculate a growth rate (i.e. each applied to ~160,000 observations).
Another two models to estimate true dbf's required to estimate growth rate in holdout data (~18,000 observations).
and... one model to calculate hazard rates.
Because of this I've decided to separate the measurement model from the mortality model. This will mean that I will first estimate true dbh for all observations used. The median estimates will then be feed in to the mortality model used to predict hazard rates.
The advantage of this approach is that true dbh only needs to be calculated once per individual. Feeding the median or mean estimate will also reduce the number of parameters