Open MariaLavrovskaya opened 2 years ago
I think you just need to change the axis
argument to tf.stack
from 0
(left most) to -1
(right most).
When you pass a sample_shape
to sample
, the output of tfd.Uniform(..).sample(...)
will take that shape. In this case, with sample_shape = 1
, the uniform has shape [1]. Then when you tf.stack([e, 1-e], 0)
, you get shape [2, 1]
. But what you want here is shape [1, 2]
. Passing a shape [2, 1]
probs array to Categorical will actually result in a batch of 2 independent, 1-category Categoricals -- we'll get 2 arrays out and all the samples will be 0! If we instead stack on axis -1, we'll get probs of shape [1, 2]
, which has the right number of categories (innermost shape), and will yield a shape [500, 1] output, which we can then sum over axis 0 in the last distribution to get the right output.
Hope this makes some sense!
Thank you so much. Totally makes sense! Should have spotted it myself
Apologies for bothering again, but I am experiencing issues getting the dims right for the Dirichlet Multinomial model and to generalise it so I can run MCMC on it. I've managed to build the function for one sample, but I cannot seem to figure out how to generalise it to the batched version. The idea is that every time we generate a sample, we want to build two Multinomial distributions. I cannot seem to figure out what the dimensions supposed to look like for posterior. I have tried adding an axis but it does not seem to help.
I have also tried printing out everything but then it made me even more confused as every time the helper functions are run, it seems to be running on one sample at a time. When we run sample, it is vectorised right? So both of distributions for each of the samples should be built at the same time?
My understanding is that for the sample of 2, we need the priors to be of the shape (2,2,5) and counts of the shape (2,2,), however I think I am missing something because it does not work. Any hints please?
#means = [0.0623815 , 0.4076164 , 0.49531835, 0.01912234, 0.0155614 ],
#bac = [0.9 , 0.1 , 0.05, 0.05, 0.1 ]
def _get_probs(priors):
answer = tf.stack([tf.reshape(priors, [5]), tf.constant([.9, 0.1, 0.05, 0.05, 0.1], dtype = tf.float32)], 0)
print(priors.shape, 'pass', answer.shape)
return answer
def _get_counts(vec):
answer = tf.cast(tf.stack([tf.reduce_sum(tf.cast(vec==1, tf.float32),0),tf.reduce_sum(tf.cast(vec==0, tf.float32),0)], 0), dtype= tf.float32)
print('pass', answer.shape)
return answer
def simplified_model(means, bac):
return tfd.JointDistributionSequential([
tfd.Uniform(low=0., high = 1.), #b
lambda b: tfd.Sample(tfd.Categorical(probs = tf.stack([b, 1.-b], -1)), sample_shape =[500], name = 'count'), #count
tfd.Dirichlet(concentration = means, name = 'priors'), #priors
lambda priors,count: tfd.Multinomial(probs = _get_probs(priors), total_count = tf.cast(tf.stack([tf.reduce_sum(tf.cast(count==1, tf.float32)),tf.reduce_sum(tf.cast(count==0, tf.float32))], 0), dtype= tf.float32)) #we can replace this with categorical distribution
])`
Couple of things going on here:
2b is easiest, but it was pretty slow when I ran it. This is probably due to Multinomial being...pretty terrible! JDSAutoBatched uses tf.vectorized_map to do some auto-batching magic, which involves some compilation/program transformation that may be ill-suited to Multinomial's implementation. That said, it may be hard to do better vectorizing by hand. I'd suggest trying both (feel free to pop back in here w/ questions!), and see which is best for your purposes.
HTH!
Thank you for being so supportive – it is much appreciated, especially because tensorflow is a bit hard to wrap my mind around it!
I have decided to go with the reshape first, because I thought it would be a good exercise to understand shapes better. As far as I understand for sample_size > 1
(let's take 2), the final shape of the Multinomial is supposed to be (2,2,5)
-> 4 distributions in total with two for each sample. Probs should be of size (2,2,5)
and total counts should be of size (2,2)
. It's definitely not the most elegant way to go about it, but it's definitely a step forward.
Will now look at the 2b as suggested and try running MCMC for both.
def _get_probs_v2(non_claim_means, priors):
return tf.concat([priors, non_claim_means], 1) #changed from 0
def _get_counts(count, priors):
multiple = tf.cast(tf.shape(priors)[0], dtype = tf.int32)
count = tf.reshape(count, [multiple, 500]) #500 will be RV in itself
return tf.stack([tf.reduce_sum(tf.cast(count==1, tf.float32), axis = 1),tf.reduce_sum(tf.cast(count==0, tf.float32), axis = 1)], -1)
def simplified_model(means_2, means):
return tfd.JointDistributionSequential([
tfd.Uniform(low=0., high = 1.), #b
lambda b: tfd.Sample(tfd.Categorical(probs = tf.stack([b, 1.-b], -1)), sample_shape =[500], name = 'count'), #count
tfd.Dirichlet(concentration = means, name = 'priors'), #priors
tfd.Dirichlet(concentration = means_2, name = 'means_b')
lambda means_b, priors, count: tfd.Multinomial(probs = _get_probs_v2(means_b, priors), total_count =_get_counts(count, priors))
FWIW, if numpy is more familiar, you could use the same TFP API but in jax instead. from tensorflow_probability.substrates import jax as tfp
On Wed, Sep 28, 2022, 11:05 AM Maria @.***> wrote:
Thank you for being so supportive – it is much appreciated, especially because tensorflow is a bit hard to wrap my mind around it!
I have decided to go with the reshape first, because I thought it would be a good exercise to understand shapes better. As far as I understand for sample_size
1 (let's take 2), the final shape of the Multinomial is supposed to be (2,2,5) -> 4 distributions in total with two for each sample. Probs should be of size (2,2,5) and total counts should be of size (2,2). It's definitely not the most elegant way to go about it, but it's definitely a step forward.
Will now look at the 2b as suggested and try running MCMC for both.
def _get_probs_v2(non_claim_means, priors):
return tf.concat([priors, non_claim_means], 1) #changed from 0
def _get_counts(count, priors):
multiple = tf.cast(tf.shape(priors)[0], dtype = tf.int32)
count = tf.reshape(count, [multiple, 500]) #500 will be RV in itself
return tf.stack([tf.reduce_sum(tf.cast(count==1, tf.float32), axis = 1),tf.reduce_sum(tf.cast(count==0, tf.float32), axis = 1)], -1)
def simplified_model(means_2, means):
return tfd.JointDistributionSequential([
tfd.Uniform(low=0., high = 1.), #b lambda b: tfd.Sample(tfd.Categorical(probs = tf.stack([b, 1.-b], -1)), sample_shape =[500], name = 'count'), #count tfd.Dirichlet(concentration = means, name = 'priors'), #priors tfd.Dirichlet(concentration = means_2, name = 'means_b') lambda means_b, priors, count: tfd.Multinomial(probs = _get_probs_v2(means_b, priors), total_count =_get_counts(count, priors))
— Reply to this email directly, view it on GitHub https://github.com/tensorflow/probability/issues/1624#issuecomment-1261051498, or unsubscribe https://github.com/notifications/unsubscribe-auth/AFJFSI5CTLFHG2XJEVW4BJTWARNDXANCNFSM6AAAAAAQVZDV6E . You are receiving this because you are subscribed to this thread.Message ID: @.***>
Thank you both for your help!
I am trying to run the MCMC sampler based on the run function shown here. I have adapted my code and found appropriate unconditional bijectors to map initial state.
However, the algorithm fails because we are getting NaN values now and then. I am very confused and this time I have no clue if there is a workaround not to see NaN values. Am I doing something wrong or is there a trick to overcome this?
The logic is as follows. Beta distribution controls for the proportion of 0 values in the generated sample. Dirichlet for 'first seen' controls the count for each of possible values occurring in 'seen' sample and Dirichlet for 'already seen' controls the sample of each of 5 possible values occurring in 'already seen' sample. They both have hyperparams coming from the data.
def _get_probs_v2(already_seen, priors):
print('prior_shape', priors.shape)
return tf.concat([priors, already_seen], 1) #changed from 0
def _get_counts(count, priors):
multiple = tf.cast(tf.shape(priors)[0], dtype = tf.int32)
count = tf.reshape(count, [multiple, 50000])
return tf.stack([tf.reduce_sum(tf.cast(count==1, tf.float32), axis = 1),tf.reduce_sum(tf.cast(count==0, tf.float32), axis = 1)], -1)
def simplified_model(first_seen, already_seen):
return tfd.JointDistributionSequential([
tfd.Beta(concentration1 = 559.49,concentration0= 291.28), #b
lambda b: tfd.Sample(tfd.Categorical(probs = tf.stack([b, 1.-b], -1)), sample_shape =[50000], name = 'count'), #count
tfd.Dirichlet(concentration = first_seen, name = 'priors'), #priors
tfd.Dirichlet(concentration = already_seen, name = 'already_seen'),
lambda already_seen, priors, count: tfd.Independent(tfd.Multinomial(probs = _get_probs_v2(already_seen, priors), total_count =_get_counts(count, priors)), reinterpreted_batch_ndims=2) #working_function
])
@tf.function
def simplified_log_prob(init_beta, init_first, init_already):
return simplified_model(first_seen, already_seen).log_prob([init_beta, init_first, init_already, input_excess])
def sample_simplied(num_results, num_burnin_steps):
hmc = tfp.mcmc.HamiltonianMonteCarlo(
target_log_prob_fn=simplified_log_prob,
num_leapfrog_steps=10,
step_size=0.005)
initial_state = [
tf.ones([1], name='init_beta'),
tf.constant(first_seen, name = 'init_first_seen'),
tf.constant(already_seen, name = 'init_already_seen')
# means* tf.ones([], dtype=tf.float32, name="init_means")
]
unconstraining_bijectors = [
tfb.Sigmoid(), #to map real values to get something between 0 and 1
tfb.SoftmaxCentered(), # Dirichlet
tfb.SoftmaxCentered()
]
kernel = tfp.mcmc.TransformedTransitionKernel(
inner_kernel=hmc, bijector=unconstraining_bijectors)
samples, kernel_results = tfp.mcmc.sample_chain(
num_results=num_results,
num_burnin_steps=num_burnin_steps,
current_state=initial_state,
kernel=kernel)
acceptance_probs = tf.reduce_mean(
tf.cast(kernel_results.inner_results.is_accepted, tf.float32), axis=0)
return samples, acceptance_probs
first_seen = tf.constant([[0.06225592, 0.40752962, 0.4955014 , 0.01913294, 0.01558013]], dtype = tf.float32)
already_seen = tf.constant([[0.9 , 0.1 , 0.05, 0.05, 0.1 ]], dtype = tf.float32)
Thank you again for being very supportive.
Can you add allow_nan_stats=False
to all distribution constructors, and validate_args=True
to all distribution and bijector constructors, and run again? You may get more informative errors and/or find source of NaNs sooner (I haven't had a close look at the latest code yet)
(no need to add these to args of tfd.Sample or tfd.Independent -- just the "concrete" distriubtions these are wrapping)
Based on the first error that I am getting, the issue seems to be in the first helper function. Will try getting to the bottom of it.
Changed the first helper so it would not raise any errors. Ran sample() a number of times and no errors are raised, however I still seen a lot of inf
.
`def _get_probs_v2(already_seen, priors):
multiple = tf.cast(tf.shape(priors)[0], dtype = tf.int32)
priors = tf.reshape(priors, [multiple, 1, 5])
already_seen = tf.reshape(already_seen, [multiple, 1, 5])
return tf.concat([priors, already_seen], 1)
def _get_counts(count, priors):
multiple = tf.cast(tf.shape(priors)[0], dtype = tf.int32)
count = tf.reshape(count, [multiple, 50000])
return tf.stack([tf.reduce_sum(tf.cast(count==1, tf.float32), axis = 1),tf.reduce_sum(tf.cast(count==0, tf.float32), axis = 1)], -1)
def simplified_model(first_seen, already_seen):
return tfd.JointDistributionSequential([
tfd.Beta(concentration1 = 559.49,concentration0= 291.28, allow_nan_stats=False, validate_args=True), #b
lambda b: tfd.Sample(tfd.Categorical(probs = tf.stack([b, 1.-b], -1), allow_nan_stats=False, validate_args=True), sample_shape =[50000], name = 'count'), #count
tfd.Dirichlet(concentration = first_seen, name = 'priors', allow_nan_stats=False, validate_args=True), #priors
tfd.Dirichlet(concentration = already_seen, name = 'already_seen', allow_nan_stats=False, validate_args=True),
lambda already_seen, priors, count: tfd.Independent(tfd.Multinomial(probs = _get_probs_v2(already_seen, priors), total_count =_get_counts(count, priors)), reinterpreted_batch_ndims=2) #working_function
])
@tf.function
def simplified_log_prob(init_beta, init_first, init_already):
return simplified_model(first_seen, already_seen).log_prob([init_beta, init_first, init_already, input_exc])`
first_seen = tf.constant([[0.06225592, 0.40752962, 0.4955014 , 0.01913294, 0.01558013]], dtype = tf.float32)
already_seen = tf.constant([[0.78, 0.1, 0.5, 0.5, 0.02]], dtype = tf.float32)
Hmm. I think you can rewrite the 2-category Categorical + sum to be a tfd.Binomial(total_count=50000).
I'm confused by the log_prob call in simplified_log_prob because it is only passing 4 values but the JointDistribution has 5 random variables (beta, sample(categorical), dir1, dir2, multinomial).
If you can provide a runnable example in a Colab, I can try and debug more proactively.
Yes, sure. https://colab.research.google.com/drive/1vSTW49QrkdIC-ciELe8nZ98N8tZefJSQ?usp=sharing
It seems like the errors are coming from Dirichlet distributions if I uselog_prob()
without rewriting it.
I first rewrote thesimplified_log_prob()
as mentioned above because I thought for MCMC we need to define a closure over our joint to output prob for our data. However, I took a step back in this example and ran predefined log_prob()
to trace the error step by step.
Update: the inf
issue is now sorted. Was incorrectly parametrising Dirichlet distributions with proportions, whereas they take mean counts. Trying to figure out how to rewrite the log_prob
manually as the one out of box fails as it is expecting shape (n,n), whereas we need (n,2), n being the number of generated samples. Is there a way to do that by passing the Params into the JointDistributionSequential? Will report my progress along the way
Update: I have skimmed through the JointDistributionSequential()
again and I have realised that in order for me to use log_prob()
, I need to refactor dimensions of the Dirichlet distributions. Will try this one.
Almost giving up :( I slightly simplified the function to use Binomial instead of a Categorical (it's a version 2) and right now log_prob()
seems to be working for the sample_size
up to 2, but it starts failing for samples larger than 2. I am not sure if I can change the params somewhere so I would not need to write a log_prob()
from scratch as I was trying not to use a handwritten approach from the very start. Would you mind having a look if you have time please? I'd greatly appreciate any suggestions! https://colab.research.google.com/drive/1vSTW49QrkdIC-ciELe8nZ98N8tZefJSQ?usp=sharing
I believe this will do what you want:
def simplified_model_v2(first_seen, already_seen):
return tfd.JointDistributionSequential([
tfd.Beta(concentration1 = 559.49,
concentration0= 291.28), #b
lambda b: tfd.Binomial(probs=b, total_count=50000, name='count'),
tfd.Independent(
tfd.Dirichlet(concentration=tf.stack([first_seen, already_seen], axis=0)),
reinterpreted_batch_ndims=1),
lambda t, count: tfd.Independent(
tfd.Multinomial(
probs=t,
total_count=tf.stack([count, 50000 - count], axis=-1)),
reinterpreted_batch_ndims=1)
])
There are a couple things I needed to fix
Sorry about the challenges! I hope the next step is easier. I admire your perseverance trying to make it work w/out the AutoBatched magic :) It's also my personal preference, even though it's sometimes painful.
Morning! Apologies I was taking a break over the weekend. This works smoothly – thank you so much.
I've noticed error in (1) later on Friday and it was stupid of me to assume that it would work any differently.
I cannot agree with you more. Although it's definitely sometimes painful and frustrating, but this is the best way to learn (as long as I can pop in here now and then to ask questions). 😸
I think I managed to make MCMC work. I took values of simplified_model_v2(first_seen, already_seen).sample() to be initial_state and then imposed bijectors to map to unconstrained space (hopefully I did it correctly). I also wrote a partial application for the target log prob function
@tf.function
def simplified_log_prob(init_beta, init_binom, init_dir):
return simplified_model_v2(first_seen, already_seen).log_prob([init_beta, init_binom, init_dir, inp_exc])
def sample_simplified(num_results, num_burnin_steps):
hmc = tfp.mcmc.HamiltonianMonteCarlo(
target_log_prob_fn=simplified_log_prob,
num_leapfrog_steps=10,
step_size=0.005)
initial_state = [
tf.constant(init_b, dtype=tf.float32, name='init_beta'),
tf.constant(init_binom, name = 'init_binomial'),
tf.constant(init_dirichlets, name = 'init_dir')
# means* tf.ones([], dtype=tf.float32, name="init_means")
]
unconstraining_bijectors = [
tfb.Sigmoid(), #bijector for Beta
tfb.Identity(), #bijector for Binomial
tfb.SoftmaxCentered() # Bijector for Dirichlets
]
kernel = tfp.mcmc.TransformedTransitionKernel(
inner_kernel=hmc, bijector=unconstraining_bijectors)
samples, kernel_results = tfp.mcmc.sample_chain(
num_results=num_results,
num_burnin_steps=num_burnin_steps,
current_state=initial_state,
kernel=kernel)
acceptance_probs = tf.reduce_mean(
tf.cast(kernel_results.inner_results.is_accepted, tf.float32), axis=0)
return samples, acceptance_probs
Thank you so much for being so supportive and guiding me!
I now need to introduce more complexity into the model by imposing another distribution for the total_count
for Binomial, but hopefully I will manage it work smoothly myself 😅
Glad to help, and glad you are continuing to make progress. I think your next challenge may be that Binomial is discrete so HMC will not work for this latent variable. You may be able to analytically marginalize out some of the discrete variables, or else do some kind of hybrid metropolis-within-gibbs thing.
I decided to take a step forward and try to generalise it to fit a number of values at a time. Right now, we are fitting only one value at a time, however what I want is to fit 7 values at a time. So for example, we generate for one Beta and one Binomial and two Dirichlets distributions we pass 7 samples into the Multinomial , assuming that each of them came from this generating process.
So based on the explanations here and your valuable comments, I am trying to think about shapes really hard. My understanding is as follows:
I started off by thinking that there are three possible scenarios: Number one:
log_prob
. This looks more intuitive, however I am not sure how I am supposed to transform the probs in the way so that it works as desired. My way of thinking was to add a dimension t[:, tf.newaxis, ...]
but this fails for the forward pass [2,5]
. Number two, which I now understand is wrong was to do the following:
log_prob
so this is probably not the right way to go forward. Number three:
probs()
function, however I wonder if there is a neater way to do it. I have not tried the MCMC algorithm on this yet but the sample()
and log_prob()
works for both. Do you think my understanding is correct and (3) are the right shapes I am supposed to be getting? Can you please suggest if my logic is right and how would you reformulate probs
in a 'cleaner' way?
I put everything in the notebook to make it a bit more readable: https://colab.research.google.com/drive/1vSTW49QrkdIC-ciELe8nZ98N8tZefJSQ?usp=sharing
Instead of summing a bunch of two way categoricals, consider using Binomial.
On Mon, Sep 26, 2022, 10:18 AM Maria @.***> wrote:
Closed #1624 https://github.com/tensorflow/probability/issues/1624 as completed.
— Reply to this email directly, view it on GitHub https://github.com/tensorflow/probability/issues/1624#event-7459382656, or unsubscribe https://github.com/notifications/unsubscribe-auth/AFJFSI2SAFPFE5J7BG3NWJLWAGWE7ANCNFSM6AAAAAAQVZDV6E . You are receiving this because you are subscribed to this thread.Message ID: @.***>
Sorry, I have been working on a deadline and haven't had time to respond...I will hopefully have some more time in a few days.
@brianwa84 I think we arrived at your suggestion a few comments back in the thread -- maybe you missed that?
No worries. I hope you will meet your deadline! I believe it works as intended 😸 I have another question that is slightly different to the thread's topic. Should I create a separate ticket for that?
I have seen in the tutorials that there is a two Poisson model which two different rate parameters, is there a way to do this with two distributions from exponential family but which are parametrised differently? I am not sure if I am even allowed to do that. I believe I have a model which has LogNormal body and after certain 'switchpoint' it has exponential tail. So far I thought of using 'Mixture', however I am worried that it's not going to give me this 'switchpoint' behaviour. Any suggestions are very much welcomed!
No worries, I sent that response on Sep 26 by email, for some reason it only just showed up.
Hi. Would you mind advising me on this please? I want to create a mixture model with two normals, where all the parameters have Uniform priors. I have implemented the model using Joint Distribution and log_prob and sample do work, however when I run MCMC sampler, it just stays on the initialised values and does not move. I also used a bijector to map constrained intervals to unconstrained space, however I am not sure what is wrong that is not working. Would you mind giving me a hand please?
def affine(rv):
return tf.transpose(rv* tf.ones([3])[..., tf.newaxis])
def model_normal_normal():
return tfd.JointDistributionSequential([
tfd.Uniform(low = 0.08, high = 0.11), # weights
tfd.Uniform(low = 4.8, high = 5.1), #loc_body
tfd.Uniform(low = 6.4, high = 7.3), #loc_tail
tfd.Uniform(low = 0.08, high = 0.1), #scale_body
tfd.Uniform(low = 0.1, high = 0.2), #scale_tail
lambda scale_tail, scale_body, loc_tail, loc_body, weights: tfd.Independent(tfd.Sample(tfd.Mixture(
cat = tfd.Categorical(
probs = tf.stack([affine(1-weights), affine(weights)], axis=-1), validate_args=True,allow_nan_stats=False),
components = [
tfd.Normal(loc = affine(loc_body), scale = affine(scale_body), validate_args=True,allow_nan_stats=False),
tfd.Normal(loc = affine(loc_tail), scale = affine(scale_tail), validate_args=True,allow_nan_stats=False)]), sample_shape = 23507),reinterpreted_batch_ndims=2)
])
#define partial application (closure) over our data
@tf.function
def log_prob_normal(init_weights, init_loc_body, init_loc_tail, init_scale_body, init_scale_tail):
return model_normal_normal().log_prob([init_weights, init_loc_body, init_loc_tail, init_scale_body, init_scale_tail, data])
# bijector to map contrained parameters to real
# Interval transformation
def tfp_interval(a,b):
return tfb.Inline(
inverse_fn=(
lambda x: tf.math.log(x - a) - tf.math.log(b - x)),
forward_fn=(
lambda y: (b - a) * tf.sigmoid(y) + a),
forward_log_det_jacobian_fn=(
lambda x: tf.math.log(b - a) - 2 * tf.nn.softplus(-x) - x),
forward_min_event_ndims=0,
name="interval")
def sample_normal_normal(num_results, num_burnin_steps):
hmc = tfp.mcmc.HamiltonianMonteCarlo(
target_log_prob_fn=log_prob_normal,
num_leapfrog_steps=10,
step_size=0.005)
initial_state = [
tf.constant(init_w, dtype=tf.float32, name='init_weights'),
tf.constant(init_l_b, dtype=tf.float32, name = 'loc_body'),
tf.constant(init_l_t, name = 'loc_tail'),
tf.constant(init_s_b, name = 'scale_body'),
tf.constant(init_s_t, name = 'scale_tail')
# means* tf.ones([], dtype=tf.float32, name="init_means")
]
#we specify bijectors to map the initial values to unconstrained space
unconstraining_bijectors = [
tfp_interval(tf.constant(0.08, tf.float32), tf.constant(0.11, tf.float32)), #bijector for all uniforms
tfp_interval(tf.constant(4.8, tf.float32), tf.constant(5.1, tf.float32)),
tfp_interval(tf.constant(6.4, tf.float32), tf.constant(7.3, tf.float32)),
tfp_interval(tf.constant(0.08, tf.float32), tf.constant(0.1, tf.float32)),
tfp_interval(tf.constant(0.1, tf.float32), tf.constant(0.2, tf.float32))
]
kernel = tfp.mcmc.TransformedTransitionKernel(
inner_kernel=hmc, bijector=unconstraining_bijectors)
samples, kernel_results = tfp.mcmc.sample_chain(
num_results=num_results,
num_burnin_steps=num_burnin_steps,
current_state=initial_state,
kernel=kernel)
acceptance_probs = tf.reduce_mean(
tf.cast(kernel_results.inner_results.is_accepted, tf.float32), axis=0)
return samples, acceptance_probs
Sigmoid bijector has low and high parameters. Can you try using that instead of the custom inline bijector? Those can be finicky at times (i see nothing wrong with your impl but haven't checked the math -- probably it's right ☺️).
Usually if HMC is rejecting too much the first thing to try is a lower step size. Try going down two orders of magnitude and see if it's accepting some. If you find a range where it's a bit better, wrap your kernel in tfp.mcmc.StepSizeAdaptation kernel initiated from that value and see how it goes.
I haven't thought much at all about the model setup. Could be something about that that is especially tricky to sample. Try the above and check back in?
On Wed, Oct 19, 2022 at 11:39 Maria @.***> wrote:
Hi. Would you mind advising me on this please? I want to create a mixture model with two normals, where all the parameters have Uniform priors. I have implemented the model using Joint Distribution and log_prob and sample do work, however when I run MCMC sampler, it just stays on the initialised values and does not move. I also used a bijector to map constrained intervals to unconstrained space, however I am not sure what is wrong that is not working. Would you mind giving me a hand please?
def affine(rv):
return tf.transpose(rv* tf.ones([3])[..., tf.newaxis])
def model_normal_normal(): return tfd.JointDistributionSequential([ tfd.Uniform(low = 0.08, high = 0.11), # weights tfd.Uniform(low = 4.8, high = 5.1), #loc_body tfd.Uniform(low = 6.4, high = 7.3), #loc_tail tfd.Uniform(low = 0.08, high = 0.1), #scale_body tfd.Uniform(low = 0.1, high = 0.2), #scale_tail lambda scale_tail, scale_body, loc_tail, loc_body, weights: tfd.Independent(tfd.Sample(tfd.Mixture( cat = tfd.Categorical( probs = tf.stack([affine(1-weights), affine(weights)], axis=-1), validate_args=True,allow_nan_stats=False), components = [ tfd.Normal(loc = affine(loc_body), scale = affine(scale_body), validate_args=True,allow_nan_stats=False), tfd.Normal(loc = affine(loc_tail), scale = affine(scale_tail), validate_args=True,allow_nan_stats=False)]), sample_shape = 23507),reinterpreted_batch_ndims=2)
])
define partial application (closure) over our data
@tf.function def log_prob_normal(init_weights, init_loc_body, init_loc_tail, init_scale_body, init_scale_tail): return model_normal_normal().log_prob([init_weights, init_loc_body, init_loc_tail, init_scale_body, init_scale_tail, data])
bijector to map contrained parameters to real
Interval transformation
def tfp_interval(a,b): return tfb.Inline( inverse_fn=( lambda x: tf.math.log(x - a) - tf.math.log(b - x)), forward_fn=( lambda y: (b - a) tf.sigmoid(y) + a), forward_log_det_jacobian_fn=( lambda x: tf.math.log(b - a) - 2 tf.nn.softplus(-x) - x), forward_min_event_ndims=0, name="interval")
def sample_normal_normal(num_results, num_burnin_steps): hmc = tfp.mcmc.HamiltonianMonteCarlo( target_log_prob_fn=log_prob_normal, num_leapfrog_steps=10, step_size=0.005)
initial_state = [ tf.constant(init_w, dtype=tf.float32, name='init_weights'), tf.constant(init_l_b, dtype=tf.float32, name = 'loc_body'), tf.constant(init_l_t, name = 'loc_tail'), tf.constant(init_s_b, name = 'scale_body'), tf.constant(init_s_t, name = 'scale_tail')
means* tf.ones([], dtype=tf.float32, name="init_means")
]
we specify bijectors to map the initial values to unconstrained space
unconstraining_bijectors = [ tfp_interval(tf.constant(0.08, tf.float32), tf.constant(0.11, tf.float32)), #bijector for all uniforms tfp_interval(tf.constant(4.8, tf.float32), tf.constant(5.1, tf.float32)), tfp_interval(tf.constant(6.4, tf.float32), tf.constant(7.3, tf.float32)), tfp_interval(tf.constant(0.08, tf.float32), tf.constant(0.1, tf.float32)), tfp_interval(tf.constant(0.1, tf.float32), tf.constant(0.2, tf.float32)) ] kernel = tfp.mcmc.TransformedTransitionKernel( inner_kernel=hmc, bijector=unconstraining_bijectors) samples, kernel_results = tfp.mcmc.sample_chain( num_results=num_results, num_burnin_steps=num_burnin_steps, current_state=initial_state, kernel=kernel) acceptance_probs = tf.reduce_mean( tf.cast(kernel_results.inner_results.is_accepted, tf.float32), axis=0)
return samples, acceptance_probs
— Reply to this email directly, view it on GitHub https://github.com/tensorflow/probability/issues/1624#issuecomment-1284214240, or unsubscribe https://github.com/notifications/unsubscribe-auth/AABG2GMDHVEJCGYTWXVIQRTWEAI4RANCNFSM6AAAAAAQVZDV6E . You are receiving this because you commented.Message ID: @.***>
Thank you! It works with the smaller step size and Sigmoid
bijector.
I have another question: I was playing around with reinterpreted_batch_ndims
and I do not seem to understand the difference when I set this value to different parameters.
For example, if I sample once, each uniform produces 1 value. Then the mixture constructs three gaussian mixtures with roughly 20000 values each. The idea is that I want to find parameters for the mixture of gaussians that would be 'optimal' across all of 3 datasets each of which has results from different time period. When I set reinterpreted_batch_ndims=1
, I have batch_shape = [1]
and event_shape = [3,20000]
, whereas when I set reinterpreted_batch_ndims=2
I have batch_shape = []
and event_shape = [1,3,20000]
. Both work when sample
and log_prob
is called. Values in log_prob
would differ for different reinterpreted_batch_ndims
. However, the sampler only seems to work when reinterpreted_batch_ndims=2
and throws an error when reinterpreted_batch_ndims=1
.
Would you mind explaining what is the right approach here and why this behaviour is intended please?
Found the answer to my question in the walk-through on shapes – thanks for help!
I don't think concentration is supposed to be negative?
On Mon, Oct 24, 2022 at 11:55 Maria @.***> wrote:
Hi again! Apologies for asking too many questions. I want to use generalised Pareto to fit the tail, however for some reason log_prob is -inf for this model. I am not sure why though, because it does work outside the Joint distribution. Would you mind giving me a hint?
return tfd.JointDistributionSequential([ tfd.Uniform(low = 0.01, high = 0.02), #loc tfd.Uniform(low = 0.7, high = 0.75), #scale tfd.Uniform(low = -0.2, high = -0.35), #concentration lambda conc, sc, loc: tfd.Independent(tfd.Sample( tfd.GeneralizedPareto(loc = affine(loc), scale = affine(sc), concentration = affine(conc), validate_args=True,allow_nan_stats=False), sample_shape = 1277),reinterpreted_batch_ndims=2)
])
— Reply to this email directly, view it on GitHub https://github.com/tensorflow/probability/issues/1624#issuecomment-1289247470, or unsubscribe https://github.com/notifications/unsubscribe-auth/AABG2GOGMWZ3SMKUGKVRC5LWE2WQDANCNFSM6AAAAAAQVZDV6E . You are receiving this because you commented.Message ID: @.***>
I managed to find my mistake. Concentration can be negative, however all values in x are bounded from above by loc-scale/concentration
, hence the error.
Hello! Is there an alternative in TFP to the posterior predictive samples from a model given a trace as in https://docs.pymc.io/en/v3/api/inference.html ?
The sample
function on JointDistriubtions accepts arguments that let you "pin" certain values during sampling. you can use this to get posterior predictive samples, given posterior samples. The way we generate docs makes it a bit hard to suss out, but it's expressed most clearly (perhaps still not all that clearly...) in the sample_distributions
docstring here. The same applies to the sample method, even though the docstrings don't say so (they're inherited with modifications from the base Distribution object...😬)
In short you can do things like this:
jds = tfd.JointDistributionSequential([
tfd.Normal(0., 1., name='x')
lambda x: tfd.Normal(x, 1., name = 'y')
])
# (1) pin value of x, but not y:
jds.sample(10, value=[4, None])
# (2) pin value of y, but not x:
jds.sample(10, value=[None, -3])
# (3) same as (1), but use kwargs to pin, based on the `name` parameters of the constituent distributions:
jds.sample(10, x=4)
# (4) same as (2), but use kwargs to pin
jds.sample(10, y=-3)
constructing a valid value
argument can be a bit of a hassle (have to plug in None
s in just the right places), hence the latter form.
I see, many thanks!
I am trying to build a graphical model in Tensorflow Probability, where we first sample a number of positive (1) and negative (0) examples (count_i) from Categorical distribution and then construct Multinomial distribution (Y_i) depending on the value of (count_i). These events (Y_i) are mutually exclusive :
Y_1 ~ Multinomial([.9, 0.1, 0.05, 0.05, 0.1], total_count = [tf.reduce_sum(tf.cast(count==1, tf.float32))
Y_2 ~ Multinomial([0.99, 0.01, 0., 0., 0.], total_count = [tf.reduce_sum(tf.cast(count==0, tf.float32))
I have read these tutorials, however I am stuck with two issues:
My code is as follows. You can run these to replicate the behaviour: