jrfaulkner / spmrf

R package for Bayesian nonparametric adaptive smoothing with Stan
GNU Lesser General Public License v3.0
8 stars 4 forks source link

spline model with measurement errors #6

Open pgajer opened 6 years ago

pgajer commented 6 years ago

Hi Jim,

I am working on a Bayesian locally-adaptive nonparametric smoothing logistic regression model accounting for measurement errors.

The likelihood loop of the model has the form

for (i in 1:N)
{
    z[i] ~ normal(-5, 5);
    x[i] ~ normal(z[i], sigma[i]);
    y[i] ~ bernoulli_logit(theta[i]);
}

where x is input variable (called in your code xvar1) and sigma[i] is the standard deviation of the error associated with observed value x[i], so z[i] is a true value of x[i].

Assuming that all z[i]'s are unique, one can create variable dz (in your code duxvar1)

for (j in 1:(N-1))
    dz[j] = z[j+1] - z[j];

and use it in the spline section of the transformed parameters block of your code. Unfortunately, z[i]'s will not be sorted and your code seem to require xvar1 to be sorted and so z[i]'s to be sorted.

One can of course sort it using sort_asc() in stan, but if z gets sorted, then the same order needs to be applied to the response variable y and this is where the problem as stan does not allow int variables in a transformed parameters block.

I wonder if you have any suggestions how to modify your code so that measurement error can be accounted for.

Thanks ! :) pawel

jrfaulkner commented 6 years ago

Pawel, This is a difficult problem that will take some thought and a little experimentation. The sorting is necessary due to the Markov properties of the model. We need to do this to create a grid over the covariate values where the grid has unequal spacing. The problem here is when ordering of the z values change due to the measurement error uncertainty. It is akin to a label switching problem. I don't see an obvious answer right now given the current set up of the model, but will give it some thought.

Jim

jrfaulkner commented 6 years ago

One possible solution would be to make a fixed grid for the trend parameters and then let the z's and associated y's move around the grid cells. I think we could use local variables in stan to get around the discrete stochastic indexing problem. Let me see if I can come up with some code for a simple example.