Averaging over Observations per Trace?

collinskatie commented 4 years ago

Hi,

Is there a way to average over the sampling of a set of observations for a single trace when scoring that trace (for instance, in the update step or when deciding whether to accept/reject a change?)

Our model has noise within the collision dynamics of the physics engine, so the sampled trajectory of a ball (we trace the x and y per timepoint assuming they are sampled from a gaussian centered at that x,y position); however, if the noise "accidentally" makes the trajectory match very closely this trace will dominate our set of sampled traces, when in reality, this is a coincidental "good fit" with the observations of the ball trajectory we are trying to match.

Given this noise, is there a way during inference to average over multiple runs of our physics engine per trace? For context, our generative model runs this forward once and traces the x,y at each step. We have also tried running multiple chains, but still within some chains, one trace dominates given the "coincidental" noise fit.

I'm happy to provide any clarification to the question if needed - thank you for any help!

marcoct commented 4 years ago

@collinskatie That's a really good question, which is closely related to "pseudo-marginal" Monte Carlo inference algorithms, which use stochastic approximations to marginal likelihoods which are integrals over the values of encapsulated random choices like your dynamics noise. Every operation (including update) triggers fresh simulation of these choices and this can be seen a single-sample importance sampling estimate of the marginal likelihood, which is an integral over the encapsulated randomness in the dynamics. The GFI was specifically designed to allow you to increase the number of replicates used to estimate these integrals without needing to change your inference code, so it's really cool to see you have a need for this.

There are a couple ways to implement things such that N runs of the simulator are used instead of 1 run within every call to e.g. update.

It is actually be possible to use a generic generative function combinator that wraps your existing model and accepts a number of replicates to use for each operation. But unfortunately we don't have a version of that combinator ready to be released yet.

In lieu of that combinator, it should be possible to extend your model implementation to have this behavior:

First, factor your model so it breaks down into a sequence of three steps if it doesn't already: (i) sample latent traced random choices, (ii) run simulator without any traced random choices, (iii) sample observable traced random choices.
Then, replace part (ii) with a version that runs the simulation N times, and returns instead of one simulation result a vector of simulation results.
Then, replace each observation random choice distribution (e.g. Gaussian with mean at a simulated location) with a uniform mixture distribution (e.g. a mixture of Gaussians where the means are the vector of simulated locations). I'll include some code below that defines a Mixture type that might be useful for this. It should be possible to replace each e.g. normal random choices with a call to mixture_of_normal e.g. maybe something like this:

const N = 10 # number of simulations that will be used within every model operation
const mixture_of_normals = Mixture([normal for _ in 1:N])

..
@gen function model( ..) 
..
simulation_results::Vector = run_replicated_simulation(.., N)

for object_id in objects
   {(:x, t, object_id)} ~ mixture_of_normals([fill(1/N, N)], [(get_object_x(simulation_results[i], object_id), noise) for i in 1:N])
   {(:y, t, object_id)} ~ mixture_of_normals([fill(1/N, N)], [(get_object_y(simulation_results[i], object_id), noise) for i in 1:N])
end
..
end

I suppose it would be better to make N an argument to the model. It is fine to dynamically construct the distribution within the model code if you are using the dynamic modeling language e.g."

{(:x, t, object_id)} ~ (Mixture([normal for _ in 1:N]))([fill(1/N, N)], ... )

Of course you can decide not to use Mixture and write your own custom mixture distributions that are more specialized for your use case.

Here is the definition of Mixture:

using Gen

struct Mixture{T} <: Distribution{T}
    components::Vector{Distribution{T}}
end

function Gen.logpdf(
        dist::Mixture{T}, x::T, weights::Vector{Float64},
        arg_tuples::Vector) where {T}
    ls = Vector{Float64}(undef, length(dist.components))
    for i=1:length(dist.components)
        ls[i] = logpdf(dist.components[i], x, arg_tuples[i]...) + log(weights[i])
    end
    return logsumexp(ls)
end

function Gen.random(
        dist::Mixture, weights::Vector{Float64},
        arg_tuples::Vector)
    i = categorical(weights)
    return random(dist.components[i], arg_tuples[i]...)
end

function Gen.logpdf_grad(
        dist::Mixture{T}, x::T, weights::Vector{Float64},
        arg_tuples::Vector) where {T}
    error("not implemented")
end

(dist::Mixture)(weights, arg_tuples) = random(dist, weights, arg_tuples)
Gen.is_discrete(dist::Mixture) = is_discrete(dist.components[1])
Gen.has_output_grad(dist::Mixture) = false
Gen.has_argument_grads(dist::Mixture) = (false, false)

collinskatie commented 4 years ago

Hi @marcoct - thank you so much!! A mixture of normals is exactly what we're looking for.

I pasted in the Mixture class you wrote and started adapting our generative model as you suggested and trying adapting some of your code snippets, but am having some trouble creating the mixture of normals with the class, as is.

Is the way you recommended the best way to create the struct once defined? I'm not exactly sure of the type that the Mixture struct is expecting when we construct it (I think I'm a bit confused on some the Gen syntax still)?

Here is the error with the command I ran:

julia> const mixture_of_normals = Mixture([normal for _ in 1:N])
ERROR: MethodError: no method matching Mixture(::Array{Gen.Normal,1})
Closest candidates are:
  Mixture(::Array{Distribution{T},1}) where T at none:2
Stacktrace:
 [1] top-level scope at none:0

I'm a bit confused how [normal for _ in 1:N] would not satisfy the array of distributions type the struct seems to want?

Thanks!

marcoct commented 4 years ago

@collinskatie This should work:

 Mixture{Float64}([normal for _ in 1:10])

collinskatie commented 4 years ago

Perfect yes that works - thanks!!

collinskatie commented 4 years ago

Hi @marcoct - sorry for all of the questions. I restructured all of our code and have nearly everything working except the actual sampling for the trace. It seems the logpdf types aren't matching up exactly as defined? I've been trying to parse the error message and it seems to me that the types are what's expected for the parameters of the function you defined within the Mixture struct?

{:x} ~ Mixture{Float64}([normal for _ in 1:N])([fill(1/N, N)], [(xs[i], measurement_noise) for i in 1:10])

ERROR: LoadError: MethodError: no method matching logpdf(::Mixture{Float64}, ::Float64, ::Array{Array{Float64,1},1}, ::Array{Tuple{Float64,Float64},1})

Do you have a sense for what may be going on here - I will keep looking regardless but just wanted to check if there was an obvious Gen or julia error that I'm missing. xs here is an array of floats and measurement noise is a float as well.

Thanks!

alex-lew commented 4 years ago

@collinskatie I think the problem may be that you have square brackets around fill(1/N, N) -- the fill function already produces a list, so you should be OK removing them :-)

{:x} ~ Mixture{Float64}([normal for _ in 1:N])(fill(1/N, N), [(x, measurement_noise) for x in xs])

collinskatie commented 4 years ago

Thank you @alex-lew !! That worked!

probcomp / Gen.jl

Averaging over Observations per Trace? #308