Extra variation in hyperparameters

sjbonner commented 3 years ago

The current model includes individual variation in the SD of some parameters that causing indentifiability problems. In particular, there is individual variation in the SD of the changepoints.

pmahon3 commented 3 years ago

Ahhh ok. I will remove. The SD for changepoints should simply be drawn from a population level distribution?

sjbonner commented 3 years ago

Hey Patrick, I'm still working through this, but this is one thing I noticed. Don't make any changes yet, I'll keep you posted.

pmahon3 commented 3 years ago

Ok I won't push anything. I eliminated that variation and the coverage seems more accurate but imprecise

sjbonner commented 3 years ago

I'm about to push my revised version and wanted to highlight some of the changes that I've made to the model and simulation. Here's what I've done:

Variation in signal strength: your model included a layer that allowed for individual variation in both the mean and variance of the signal strength. I've removed these so that the distribution of the mean and variance are the same for each individual. I.e., the mean signal strength for each period was modelled hierarchically with a separate value of the mean for each bird. I removed this so that there is a single mean signal strength in each period for all birds. The reason is that mean signal strength depends on the locations of the birds and their orientation relative to the tower. If we assume that birds are moving at random then there should be no individual effect.
Model on standard deviations: each standard deviation was assigned a separate prior. I've turned this into a hierarchical prior in order that information about the standard deviations is shared between birds. E.g., you had assigned an independent prior to every value of sigma[i,j,p]. I've turned this into a hierarchical model by assuming that each value of log(sigma[i,j,p]) is drawn from a normal distribution with common mean and variance which are then assigned hyperparameters with fixed parameters.
Simulation study: when simulating data you had drawn some values from the prior distributions. E.g., the standard deviations were drawn from the prior distributions for these parameters. These prior distributions had high variance (i.e., they were non-informative) and this was creating unrealistic values for these parameters. I've modified the simulation function so that these are now generated from the log-normal model.
Scale for the half-t: I've changed the scale of the half t-distributions relating to the time of the changepoints. The scale of the changepoints (time in hours) is very different from the scale of the signal strength (in decibels). A scale of 1 for the changepoints represents a very large variation. I've changed these values from 1 to .1 which I think makes more sense.

Other changes:

I've separated the likelihood and priors in the model code just so that I could keep these straight.
I tried to tidy and add some comments as I made these changes mostly to help me keep track of the model.

pmahon3 commented 3 years ago

Hi Simon,

Ok great, that clears things up for sure!

Makes sense! I guess I was focused on ensuring variation on signal to account for different individual propensities towards movement that I added to much variation. I guess the key parameter in this respect is restricted to just individual variation for signal strength measurement variance. ( Mean signal strength is not a function of an individuals ‘activity’ propensity, just location which varies little with respect to measurement resolution)
This relates to 1. I guess I thought it ‘realistic’ to consider each bird has a certain inherent activity level that varies across the population. Some birds are higher energy, some are lower energy.
I was struggling to find a good software design to keep the simulation and model specification (priors) with a simple interface between the two.
Makes sense.

Should we try to collect some longer runs at this point?

sjbonner commented 3 years ago

That sounds like a plan. I suggest that you rerun your simulation with a smaller number of replicates (maybe 25 data sets). That would give us a good idea if things are working or not. Will you have time before we meet next week so we could talk about the results then?

sjbonner commented 3 years ago

One more thing... I reduced the number of observations per bird from 600 to 60 to speed up my tests. You may want to increase this, but I thin that 600 is probably more than enough. You could try something in the middle.

pmahon3 commented 3 years ago

I will definitely have time to get in enough runs by Wednesday. Is there any other summary statistics besides coverage, spread, and exemplary chain plots that would be useful?

pmahon3 commented 3 years ago

Ok. I found a slightly adapted version of the model appears successful. Results are available in model version 1.2 found here. This can be compared against the previous model, model v1.1, here. The new model has coverage of:

muMuDelta: 0.88 0.92 muDelta.prime: 0.96

The HDI typically excludes 0. Here are some representative chains. Lower and Upper bounds on the 95% HDI are in dashed blue and the true value is dashed red. The numbers above are for a run of 25 populations but I've gotten similar results for a run of 100 populations

The difference from model v1.1 to model v1.2 is delta.prime is included in the calculation of the mean of delta2 on the final day as opposed to inclusion in the model response. Here are the relevant portions of the code,

v1.1: https://github.com/pmahon3/UWO-MOTUS-Analysis/blob/issue8-extra-variation/src/main/R/Population/v1.1/Model.R#L31-L33 https://github.com/pmahon3/UWO-MOTUS-Analysis/blob/issue8-extra-variation/src/main/R/Population/v1.1/Model.R#L61-L67

v1.2: https://github.com/pmahon3/UWO-MOTUS-Analysis/blob/issue8-extra-variation/src/main/R/Population/v1.2/Model.R#L31-L32 https://github.com/pmahon3/UWO-MOTUS-Analysis/blob/issue8-extra-variation/src/main/R/Population/v1.2/Model.R#L62-L67

pmahon3 commented 3 years ago

This newer model typically fails to capture delta2 on the final day. I haven't figured out how it does a good job of covering delta.prime, muDelta.prime, and muMuDelta[1:2] but not delta2 on the final day.

pmahon3 commented 3 years ago

So, with respect delta2 on the final day here is what I think might be going on.

In the simulation delta2 and delta.prime are drawn from two separate distributions and combined to determine the period. In the model the final delta2 fits this simulated change point that is the sum of these parameters, delta2 (simulated) and delta.prime (simulated). So delta2 is fitting delta2 (simulated) + delta.prime (simulated).

It seems that, in order to recover the simulated delta2, it is enough to say that the last delta2 is drawn from a distribution centred near the mean of all previous whole delta2s, as described by muDelta2, plus some additional delta.prime bit.

In this way the simulated delta2 on the final day can be recovered by considering the difference between the fitted final day delta2 chain and the fitted delta.prime chain.

I've run some initial calculations on this and it seems to work.

Edit:

It looks ok. The I've updated the calculation of the final day delta2 coverage with this commit. It saves the coverage results of the delta variables in csv's located here. For each csv the rows correspond the a bird and the columns correspond to a day. The final column is an average across each row and the final row is an average across each column (applied in that order).

sjbonner commented 3 years ago

Hey @pmahon3. I've looked at the v1.1 and v1.2 models, and you've stumbled onto the issue of hierarchical centering. The two models are actually equivalent in terms of their fit to the data. The difference is that in v1.1 you have included delta.prime[j] as a separate effect on the second changepoint on the final day whereas in model v1.2 you have included delta.prime[j] in the mean of the second changepoint on the final day. If you compute the mean and variance of the second changepoint for the two models then you will find that they are identical and since the normal distribution is defined by the first two moments the models are equivalent.

What has changed is the interpretation of delta[i,nDays,2], which is why the credible intervals for this quantity no longer have appropriate coverage. In v1.1, delta[i,nDays,2] represents the time where the changepoint would be if this were any other day. The actual changepoint is then given by delta[i,nDays,2] + delta.prime[i]. In v1.2, you have used delta[i,nDays,2] to represent the actual changepoint. This will increase the value by about delta.prime[i] on average and shift the credible interval so that it no longer covers the true parameter value. Since the data was generated according to the first model, you should find that in v1.2 the credible interval should cover delta[i,nDays,2] + delta.prime[i] with approximately the correct probability.

The reason this is important is that these types of changes can affect the performance of the MCMC sampler in terms of how fast the Markov chains converge and how well they move through the parameter space. It's often difficult to predict which will be better, but there can be a marked difference in the efficiency between the two models even though they are equivalent.

So, provided that the chains are being run for long enough both should give very similar results except for the change in the interpretation of the delta[i,nDays,2].

pmahon3 commented 3 years ago

I was wondering about this. I had the sense they were equivalent in some sort of limit but wondered for v1.1 if it was a matter of chain length or data density/abundance. I assume with a higher density of data the model would converge faster as well? I understand why v1.1 would fit well in the limit of increasing data density/abundance, muDelta[i,2] would become sufficiently narrow to pin down muDelta[i,nDays,2] allowing a good fit for delta.prime[I], but is there/how do we know the trade off between data abundance and convergence rate; specifically if an upper bound on data abundance implies a lower bound on the posterior precision of muDelta[i,2], would that allow delta[i,nDays,2] and delta.prime[i] to perpetually wonder in some probable density that sums to the correct changepoint but each variable is still, individually, inaccurate and imprecise?

Also I haven't tried this, but I assume if we made the equivalent change on the data simulation side we would get the same result. In other words either model could fit either simulation method?

Lastly, does the different interpretation change how confident we can be about the conclusion? Is there a reason to prefer one over the other?

sjbonner commented 3 years ago

Hey Patrick,

You are correct that there is a limit to the precision of delta[I,nDays,2] and delta.prime[i]. No matter how much data we collect, we will never be able to pin these down exactly. The final precision is determined by the variance of delta[I,j,2] about muDelta[I,2] for j < nDays.

The problem is exactly the same as the difference between estimation of the mean and prediction of a future observation in linear regression. The variance of the mean will decrease to 0 as the amount of data increases, but the variance of the prediction is bounded below by the error in the system. No matter how much data you collect, there is always some natural variation that prevents you from predicting an outcome with perfect accuracy. Without this, we'd all know exactly what stocks to pick and I would retire tomorrow!

In our case, you can imagine that we are trying to predict delta[i,nDays,2] based on the previous days. We then use this value to compute delta.prime[i] by subtraction. The problem is that we can never predict delta[I,nDays,2] exactly, provided that there is variation day-to-day within an individual, and so we can never estimate delta.prime[i] exactly either. Does that make sense?

I just looked at the simulations for v1.2, and I think that the coverage of muMuDelta may actually be underestimated. In the worst case, which is the first changepoint for pop 7, the HDI fails to cover the true value by just .05 of an hour (3 minutes). I expect that this may be sampling error. 200 iterations is not much when it comes to estimating quantiles in the tail of a distribution. E.g., the 97.5 quantile is estimated by the point such that 2.5% of the data is above this point. This means that you are looking at the point such that 5/200 iterations are above that point, and that is going to be highly variable. I expect that the coverage will improve if you run the chains for longer. Overall, I'd say that there is no evidence at the moment that the model isn't working.

I haven't started on the alternative formulation of the model for now. We can chat about this tomorrow, but what I would really like to do is to start looking at some real data. I will contact Yolanda and hopefully we can talk about that tomorrow too.

See you then,

Simon

Simon Bonner (he/him) Associate Professor of Environmetrics Department of Statistical and Actuarial Sciences University of Western Ontario

Office: Western Science Centre rm 276

Email: @.**@.> | Telephone: 519-661-2111 x88205 | Fax: 519-661-3813 Twitter: @bonnerstatslab | Website: http://simon.bonners.ca/bonner-lab/wpblog/

From: Patrick Mahon @.> Sent: July 20, 2021 11:37 AM To: pmahon3/UWO-MOTUS-Analysis @.> Cc: Simon Bonner @.>; Author @.> Subject: Re: [pmahon3/UWO-MOTUS-Analysis] Extra variation in hyperparameters (#8)

I was wondering about this. I had the sense they were equivalent in some sort of limit but wondered for v1.1 if it was a matter of chain length or data density/abundance. I assume with a higher density of data the model would converge faster as well? I understand why v1.1 would fit well in the limit of increasing data density/abundance, muDelta[i,2] would become sufficiently narrow to pin down muDelta[i,nDays,2] allowing a good fit for delta.prime[I]., but is there/how do we know the trade off between data abundance and convergence rate; specifically if an upper bound on data abundance implies a lower bound on the posterior precision of muDelta[i,2], would that allow delta[i,nDays,2] and delta.prime[i] to perpetually wonder in some probable density that sums to the correct changepoint, but each variable is still, individually, inaccurate and imprecise?

Also I haven't tried this, but I assume if we made the equivalent change on the data simulation side we would get the same result. In other words either model could fit either simulation method?

- You are receiving this because you authored the thread. Reply to this email directly, view it on GitHubhttps://github.com/pmahon3/UWO-MOTUS-Analysis/issues/8#issuecomment-883491840, or unsubscribehttps://github.com/notifications/unsubscribe-auth/AB6LF3I2TA7D6JHZ5DRZLPLTYWJZHANCNFSM46OQCCXQ.

pmahon3 commented 3 years ago

Hi Simon,

Sounds good. That is helpful, thanks!

I guess one more way I'd like to ask the same question is: given all things constant with the exception of the model code itself, if v1.2 converges does that imply v1.1 must eventually converge?

See you tomorrow

pmahon3 / UWO-MOTUS-Analysis

Extra variation in hyperparameters #8